Discussion:
PROC SQL: Does GROUP BY imply ORDER BY?
(too old to reply)
Talbot Michael Katz
2005-08-04 22:15:00 UTC
Permalink
Hi.

Does a GROUP BY in PROC SQL always result in ORDER BY of the group
variables unless a different ordering is specified in an ORDER BY clause?
I just did a PROC SQL with "GROUP BY v1, v2" and "ORDER BY v1" only, but
the PROC CONTENTS told me that the data set was ordered by both v1 and
v2. If I leave the ORDER BY off, I get the same thing. If I explicitly
do "ORDER BY v2" only, it orders by v2 AND v1. If I ORDER BY a variable
not in the GROUP BY clause, then it orders by that variable only. If I
ORDER BY a subset of the GROUP BY variables, will it always try to use
every variable in the GROUP BY clause in the ordering?

-- TMK --
"The Macro Klutz"
Sigurd Hermansen
2005-08-05 00:19:38 UTC
Permalink
Talbot:
GROUP BY may or may not order a dataset by the GROUP BY list. ORDER BY =
orders the yield (result) of a query by the ORDER BY list. This means =
that ORDER BY always supercedes any ordering that a query happens to =
impose on its yield.
Sig

________________________________

From: owner-sas-***@listserv.uga.edu on behalf of Talbot Michael Katz
Sent: Thu 8/4/2005 6:15 PM
To: SAS-***@LISTSERV.UGA.EDU
Cc: Talbot Michael Katz
Subject: PROC SQL: Does GROUP BY imply ORDER BY?



Hi.

Does a GROUP BY in PROC SQL always result in ORDER BY of the group
variables unless a different ordering is specified in an ORDER BY =
clause?
I just did a PROC SQL with "GROUP BY v1, v2" and "ORDER BY v1" only, but
the PROC CONTENTS told me that the data set was ordered by both v1 and
v2. If I leave the ORDER BY off, I get the same thing. If I explicitly
do "ORDER BY v2" only, it orders by v2 AND v1. If I ORDER BY a variable
not in the GROUP BY clause, then it orders by that variable only. If I
ORDER BY a subset of the GROUP BY variables, will it always try to use
every variable in the GROUP BY clause in the ordering?

-- TMK --
"The Macro Klutz"
Talbot Michael Katz
2005-08-05 01:38:48 UTC
Permalink
Hi Sig.

Yes, I would expect the standard for SQL GROUP BY to specify grouping
only. What caught my attention, though, was that PROC SQL did an ORDER BY
that wasn't asked for, and even registered that fact in the data set's
metadata; it wasn't just an accident.

-- TMK --
"The Macro Klutz"
Post by Sigurd Hermansen
GROUP BY may or may not order a dataset by the GROUP BY list. ORDER BY
orders the yield (result) of a query by the ORDER BY list. This means that
ORDER BY always supercedes any ordering that a query happens to impose on
its yield.
Post by Sigurd Hermansen
Sig
________________________________
Sent: Thu 8/4/2005 6:15 PM
Cc: Talbot Michael Katz
Subject: PROC SQL: Does GROUP BY imply ORDER BY?
Hi.
Does a GROUP BY in PROC SQL always result in ORDER BY of the group
variables unless a different ordering is specified in an ORDER BY clause?
I just did a PROC SQL with "GROUP BY v1, v2" and "ORDER BY v1" only, but
the PROC CONTENTS told me that the data set was ordered by both v1 and
v2. If I leave the ORDER BY off, I get the same thing. If I explicitly
do "ORDER BY v2" only, it orders by v2 AND v1. If I ORDER BY a variable
not in the GROUP BY clause, then it orders by that variable only. If I
ORDER BY a subset of the GROUP BY variables, will it always try to use
every variable in the GROUP BY clause in the ordering?
-- TMK --
"The Macro Klutz"
Sigurd Hermansen
2005-08-05 01:49:01 UTC
Permalink
Talbot:
While the group by may have ordered the yield (output) of the query =
according to the GROUP BY list, SQL does not guarantee that ordering. =
When in doubt specify an ORDER BY clause with the same list.
Sig

________________________________

From: owner-sas-***@listserv.uga.edu on behalf of Talbot Michael Katz
Sent: Thu 8/4/2005 9:38 PM
To: SAS-***@LISTSERV.UGA.EDU; Sigurd Hermansen
Cc: Talbot Michael Katz
Subject: Re: PROC SQL: Does GROUP BY imply ORDER BY?



Hi Sig.

Yes, I would expect the standard for SQL GROUP BY to specify grouping
only. What caught my attention, though, was that PROC SQL did an ORDER =
BY
that wasn't asked for, and even registered that fact in the data set's
metadata; it wasn't just an accident.

-- TMK --
"The Macro Klutz"


On Thu, 4 Aug 2005 20:19:38 -0400, Sigurd Hermansen =
Post by Sigurd Hermansen
GROUP BY may or may not order a dataset by the GROUP BY list. ORDER BY
orders the yield (result) of a query by the ORDER BY list. This means =
that
ORDER BY always supercedes any ordering that a query happens to impose =
on
its yield.
Post by Sigurd Hermansen
Sig
________________________________
Sent: Thu 8/4/2005 6:15 PM
Cc: Talbot Michael Katz
Subject: PROC SQL: Does GROUP BY imply ORDER BY?
Hi.
Does a GROUP BY in PROC SQL always result in ORDER BY of the group
variables unless a different ordering is specified in an ORDER BY =
clause?
Post by Sigurd Hermansen
I just did a PROC SQL with "GROUP BY v1, v2" and "ORDER BY v1" only, =
but
Post by Sigurd Hermansen
the PROC CONTENTS told me that the data set was ordered by both v1 and
v2. If I leave the ORDER BY off, I get the same thing. If I =
explicitly
Post by Sigurd Hermansen
do "ORDER BY v2" only, it orders by v2 AND v1. If I ORDER BY a =
variable
Post by Sigurd Hermansen
not in the GROUP BY clause, then it orders by that variable only. If I
ORDER BY a subset of the GROUP BY variables, will it always try to use
every variable in the GROUP BY clause in the ordering?
-- TMK --
"The Macro Klutz"
Droogendyk, Harry
2005-08-05 01:44:50 UTC
Permalink
Yes, SAS does order the dataset by the GROUP BY variables, even if unasked to do so, but that's a handy feature IMHO. However, many RDBMSs don't, eg. Teradata which requires that you also specify a ORDER BY if you want your resulting data in GROUP BY order.

-----Original Message-----
From: owner-sas-***@listserv.uga.edu
[mailto:owner-sas-***@listserv.uga.edu]On Behalf Of Talbot Michael Katz
Sent: Thursday, August 04, 2005 9:39 PM
To: SAS-***@LISTSERV.UGA.EDU; Sigurd Hermansen
Cc: Talbot Michael Katz
Subject: Re: PROC SQL: Does GROUP BY imply ORDER BY?


Hi Sig.

Yes, I would expect the standard for SQL GROUP BY to specify grouping
only. What caught my attention, though, was that PROC SQL did an ORDER BY
that wasn't asked for, and even registered that fact in the data set's
metadata; it wasn't just an accident.

-- TMK --
"The Macro Klutz"
Post by Sigurd Hermansen
GROUP BY may or may not order a dataset by the GROUP BY list. ORDER BY
orders the yield (result) of a query by the ORDER BY list. This means that
ORDER BY always supercedes any ordering that a query happens to impose on
its yield.
Post by Sigurd Hermansen
Sig
________________________________
Sent: Thu 8/4/2005 6:15 PM
Cc: Talbot Michael Katz
Subject: PROC SQL: Does GROUP BY imply ORDER BY?
Hi.
Does a GROUP BY in PROC SQL always result in ORDER BY of the group
variables unless a different ordering is specified in an ORDER BY clause?
I just did a PROC SQL with "GROUP BY v1, v2" and "ORDER BY v1" only, but
the PROC CONTENTS told me that the data set was ordered by both v1 and
v2. If I leave the ORDER BY off, I get the same thing. If I explicitly
do "ORDER BY v2" only, it orders by v2 AND v1. If I ORDER BY a variable
not in the GROUP BY clause, then it orders by that variable only. If I
ORDER BY a subset of the GROUP BY variables, will it always try to use
every variable in the GROUP BY clause in the ordering?
-- TMK --
"The Macro Klutz"
__________________________________________________________________________________________________________________________________

This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations.
Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized.
If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately.

Ce courrier électronique est confidentiel et protégé. L'expéditeur ne renonce pas aux droits et obligations qui s'y rapportent.
Toute diffusion, utilisation ou copie de ce message ou des renseignements qu'il contient par une personne autre que le (les) destinataire(s) désigné(s) est interdite.
Si vous recevez ce courrier électronique par erreur, veuillez m'en aviser immédiatement, par retour de courrier électronique ou par un autre moyen.
Francis Harvey
2005-08-05 15:22:15 UTC
Permalink
Talbot,

A simple test confirms Sig's warning about not always performing the
ORDER BY. Using a libname with a SQL Server data source, the GROUP BY
fails to result in an ordered data set. With the SQL Server table a:

a b
-- --
3 1
3 4
1 6
1 7

and SAS program:

libname test oledb provider=sqloledb properties=("data source"=X
"Database"=X "User ID"=X "Password"=X "initial catalog"=X) schema=dbo;
proc sql;
create table b as
select a, sum(b) as sumb
from test.a
group by a;
quit;
proc contents data=b;
run;

no ordering is imposed on output data set b. I would assume that there
may be cases other than passing a query for processing to an external
provider that also return an unordered data set and would not rely on
this feature.

Francis R Harvey III
WB 303, (301)294-3952
Post by Droogendyk, Harry
-----Original Message-----
Behalf Of Sigurd Hermansen
Sent: Thursday, August 04, 2005 9:49 PM
Subject: Re: PROC SQL: Does GROUP BY imply ORDER BY?
While the group by may have ordered the yield (output) of the
query according to the GROUP BY list, SQL does not guarantee
that ordering. When in doubt specify an ORDER BY clause with
the same list.
Sig
________________________________
Sent: Thu 8/4/2005 9:38 PM
Cc: Talbot Michael Katz
Subject: Re: PROC SQL: Does GROUP BY imply ORDER BY?
Hi Sig.
Yes, I would expect the standard for SQL GROUP BY to specify grouping
only. What caught my attention, though, was that PROC SQL
did an ORDER BY
that wasn't asked for, and even registered that fact in the data set's
metadata; it wasn't just an accident.
-- TMK --
"The Macro Klutz"
On Thu, 4 Aug 2005 20:19:38 -0400, Sigurd Hermansen
Post by Sigurd Hermansen
GROUP BY may or may not order a dataset by the GROUP BY
list. ORDER BY
orders the yield (result) of a query by the ORDER BY list.
This means that
ORDER BY always supercedes any ordering that a query happens
to impose on
its yield.
Post by Sigurd Hermansen
Sig
________________________________
Sent: Thu 8/4/2005 6:15 PM
Cc: Talbot Michael Katz
Subject: PROC SQL: Does GROUP BY imply ORDER BY?
Hi.
Does a GROUP BY in PROC SQL always result in ORDER BY of the group
variables unless a different ordering is specified in an
ORDER BY clause?
Post by Sigurd Hermansen
I just did a PROC SQL with "GROUP BY v1, v2" and "ORDER BY
v1" only, but
Post by Sigurd Hermansen
the PROC CONTENTS told me that the data set was ordered by
both v1 and
Post by Sigurd Hermansen
v2. If I leave the ORDER BY off, I get the same thing. If
I explicitly
Post by Sigurd Hermansen
do "ORDER BY v2" only, it orders by v2 AND v1. If I ORDER
BY a variable
Post by Sigurd Hermansen
not in the GROUP BY clause, then it orders by that variable
only. If I
Post by Sigurd Hermansen
ORDER BY a subset of the GROUP BY variables, will it always
try to use
Post by Sigurd Hermansen
every variable in the GROUP BY clause in the ordering?
-- TMK --
"The Macro Klutz"
Howard Schreier <hs AT dc-sug DOT org>
2005-08-06 18:45:34 UTC
Permalink
What you observe is a consequence of optimization. GROUP BY processing
entails sorting, as does ORDER BY. When both are coded, PROC SQL examines
the requirements to see if there is one sort which will support both
specifications. If so, that's what it will do.

The optimization means that it won't cost you any processing time if you
code an explicit ORDER BY which is, de facto, not needed because it repeats
the GROUP BY.

All of the results you describe are compliant with your specifications. SAS
is allowed to give you more than what you asked for, in the form of
additional minor sort keys. If for some reason you don't want them, you can
code something like

order by v2, ranuni(123)

but that will force an extra sort.

As to the sort information recorded in the data set header, what SAS does
is record whatever is in fact known, even if it's more than you explicitly
requested. I think people would be upset if it did anything less.

If a table is built using a SELECT DISTINCT, I think you'll see all of the
variables identified as sort keys.

Another interesting possibility is implicit row ordering, which cannot be
captured in the header. For example, if you use

order by upcase(name)

a PROC CONTENTS will tell you that there is no sort order, even though it's
possible to do a case transformation and then BY processing without having
to do a sort.
Post by Talbot Michael Katz
Hi.
Does a GROUP BY in PROC SQL always result in ORDER BY of the group
variables unless a different ordering is specified in an ORDER BY clause?
I just did a PROC SQL with "GROUP BY v1, v2" and "ORDER BY v1" only, but
the PROC CONTENTS told me that the data set was ordered by both v1 and
v2. If I leave the ORDER BY off, I get the same thing. If I explicitly
do "ORDER BY v2" only, it orders by v2 AND v1. If I ORDER BY a variable
not in the GROUP BY clause, then it orders by that variable only. If I
ORDER BY a subset of the GROUP BY variables, will it always try to use
every variable in the GROUP BY clause in the ordering?
-- TMK --
"The Macro Klutz"
Loading...