Discussion:
PROC SQL--select DISTINCT
(too old to reply)
Tom White
2008-01-10 17:54:44 UTC
Permalink
Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
Darryl Putnam
2008-01-10 18:34:45 UTC
Permalink
Post by Tom White
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
   create table FOO2 as
      select(distinct VAR1), distinct(VAR2), distinct(VAR3),
      VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
   .....
    select(distinct VAR1), distinct(VAR2), distinct(VAR3),
   .....
quit;
Is there any way to select multiple DISTINCT variables like I am attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money -http://www.mail.com/Money.aspx?cat=money
I assume you want list of the variable values?

Try this

proc summary data=FOO1 missing nway; /* you may or may not want the
nway option, try it both ways */
class VAR1-VAR5
output out=FOO2;
run;


-Darryl
toby dunn
2008-01-10 18:12:22 UTC
Permalink
Tom ,

Im not sure what you are trying to end up with. Your code and what I gather from your write up is basically after distinct values of soome variables but non distinct values of others. The structure doesnt makes sense, in order to get the distinct values you will have to eleminate some of the values from the non distinct values, in order to get the non distinct values you have to make the distinct values non distinct.




Toby Dunn

"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch

"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Make distant family not so distant with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008
Terjeson, Mark
2008-01-10 18:15:25 UTC
Permalink
Hi Tom,

Does this get you closer to what you want?




data sample;
do i=1 to 10;
do j = 1 to 5;
do k = 1 to 3;
do m = 1 to 5;
do n = 1 to 3;
output;
end;
end;
end;
end;
end;
run;

* DISTINCT - unique combinations of all variables ;
* i.e. would get rid of duplicate records which ;
* the sample above has no dups. ;
proc sql;
create table result as
select
distinct
*
from
sample
;
quit;


* DISTINCT - unique combinations of most variables ;
* i.e. unique for i-j-k across the m-n ;
proc sql;
create table result as
select
distinct
i,j,k
from
sample
;
quit;



* COUNTs across the distinct unique combinations ;
* i.e. use GROUP BY INSTEAD OF DISTINCT ;
proc sql;
create table result as
select
i,j,k,
count(*) as count
from
sample
group by
i,j,k
;
quit;





Hope this is helpful.


Mark Terjeson
Senior Programmer Analyst, IM&R
Russell Investments


Russell Investments
Global Leaders in Multi-Manager Investing






-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-***@LISTSERV.UGA.EDU] On Behalf Of Tom
White
Sent: Thursday, January 10, 2008 9:55 AM
To: SAS-***@LISTSERV.UGA.EDU
Subject: PROC SQL--select DISTINCT

Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am
attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
Tom White
2008-01-10 18:20:33 UTC
Permalink
Toby,

What I am attempting to do is to select DISTINCT values for VAR1, VAR3, and VAR3
but all values for VAR4 and VAR5. The data summation problem I am trying to solve maybe requires something like this.

T

----- Original Message -----
From: "toby dunn"
To: "Tom White" , sas-***@listserv.uga.edu
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:12:22 +0000



Tom ,

Im not sure what you are trying to end up with. Your code and what
I gather from your write up is basically after distinct values of
soome variables but non distinct values of others. The structure
doesnt makes sense, in order to get the distinct values you will
have to eleminate some of the values from the non distinct values,
in order to get the non distinct values you have to make the
distinct values non distinct.




Toby Dunn

"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch

"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Make distant family not so distant with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008


--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
toby dunn
2008-01-10 18:35:21 UTC
Permalink
Tom ,

Well when you grab the distinct values you now loose any linking back to the other 2 variables, perhaps this will head you in the right direction:

Proc SQL ;

Create Table Distinct As
Select Distinct Var1 , Var2 , Var3
From Have ;

Create Table Need As
Select A.* , B.Var4 , B.Var5
From Distinct As A ,
Have As B ;

Quit ;


If you need distinct values for Var-Var3 seperately then:

Proc SQL ;
Create Table Need As
Select A.Var1 , B.Var2 , C.Var3 , D.Var4 , D.Var5
From
( Select Distinct Var1 From Have ) As A ,
( Select Distinct Var2 From Have ) As B ,
( Select Distinct Var3 From Have ) As C ,
( Select Var4 , Var5 From Have ) As D
;
Quit ;




Toby Dunn

"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch

"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 13:20:33 -0500
Subject: Re: PROC SQL--select DISTINCT
Toby,
What I am attempting to do is to select DISTINCT values for VAR1, VAR3, and VAR3
but all values for VAR4 and VAR5. The data summation problem I am trying to solve maybe requires something like this.
T
----- Original Message -----
From: "toby dunn"
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:12:22 +0000
Tom ,
Im not sure what you are trying to end up with. Your code and what
I gather from your write up is basically after distinct values of
soome variables but non distinct values of others. The structure
doesnt makes sense, in order to get the distinct values you will
have to eleminate some of the values from the non distinct values,
in order to get the non distinct values you have to make the
distinct values non distinct.
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Make distant family not so distant with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_012008
Richard A. DeVenezia
2008-01-10 20:42:07 UTC
Permalink
Post by Tom White
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am attempting to do above?
Hard to say for sure what you want...

This is one way, if you want every combination of x1,x2,x3 to cross with
every combination of x4,x5

------------------------------------------------
data foo;
do x1 = 1 to 4;
do x2 = 1 to 4;
do x3 = 1 to 4;
do x4 = 1 to 20;
do x5 = 1 to 20;
if ranuni(1234) < 0.01 then output;
end;end;end;end;end;
run;

proc sql;
create table bar as
select x1,x2,x3,x4,x5
from
(select distinct x1,x2,x3 from foo) as one
, (select distinct x4,x5 from foo) as two
;
quit;
------------------------------------------------

Richard A. DeVenezia
http://www.devenezia.com/
Tom White
2008-01-10 20:45:10 UTC
Permalink
Perhaps I am making this too difficult.

Given this dataset

ID DATE CLAIM_ID CODE_ID FLAG
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0

200279098 02Aug2005 A16620006 92081 0

200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1

200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0

I would like to sum up the FLAG field within each ID (in this case I only
show one ID, but in real data I have many such IDs) as follows:

For ID=200279098 and DATE=01Aug2005 I see two claims (CLAIM_ID)
which are the same. (Note that CODE_ID are different).
Since FLAG=0 for both claims, sum of FLAG= 0 + 0 =0.

For ID=200279098 and DATE=02Aug2005 I see only one claim so there is nothing going on here.
The sum of FLAG is 0.

For ID=200279098 and DATE=06Sep2005 I see four entries all of which are identical.
In this case the sum of FLAG will be 1. (Not 4.)

For ID=200279098 and DATE=01Nov2005 I see six entries, four of which are identical,
in which case sum of FLAG=1 and two remaining entries where FLAG=0 and therefore sum FLAG=0.
Their total sum is SUM_FLAG=1+0=1.

So, the desired table will look like the one below keeping only ID and SUM_FLAG as the final fields of interest.


ID SUM_FLAG
200279098 0
200279098 0
200279098 1
200279098 1


Perhaps there is an easy way to do this.

Thnak you.

----- Original Message -----
From: "toby dunn"
To: SAS-***@LISTSERV.UGA.EDU
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:35:21 +0000


Tom ,

Well when you grab the distinct values you now loose any linking
back to the other 2 variables, perhaps this will head you in the
right direction:

Proc SQL ;

Create Table Distinct As
Select Distinct Var1 , Var2 , Var3
From Have ;

Create Table Need As
Select A.* , B.Var4 , B.Var5
From Distinct As A ,
Have As B ;

Quit ;


If you need distinct values for Var-Var3 seperately then:

Proc SQL ;
Create Table Need As
Select A.Var1 , B.Var2 , C.Var3 , D.Var4 , D.Var5
From
( Select Distinct Var1 From Have ) As A ,
( Select Distinct Var2 From Have ) As B ,
( Select Distinct Var3 From Have ) As C ,
( Select Var4 , Var5 From Have ) As D
;
Quit ;




Toby Dunn

"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch

"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 13:20:33 -0500
Subject: Re: PROC SQL--select DISTINCT
Toby,
What I am attempting to do is to select DISTINCT values for VAR1,
VAR3, and VAR3
but all values for VAR4 and VAR5. The data summation problem I am
trying to solve maybe requires something like this.
T
----- Original Message -----
From: "toby dunn"
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:12:22 +0000
Tom ,
Im not sure what you are trying to end up with. Your code and what
I gather from your write up is basically after distinct values of
soome variables but non distinct values of others. The structure
doesnt makes sense, in order to get the distinct values you will
have to eleminate some of the values from the non distinct values,
in order to get the non distinct values you have to make the
distinct values non distinct.
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Make distant family not so distant with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_012008


--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
Richard A. DeVenezia
2008-01-11 04:42:46 UTC
Permalink
Post by Tom White
Perhaps I am making this too difficult.
Given this dataset ...
You need a sub-query to obtain the rows that will be used in the
summarization. The sub-query is making tacit assumptions about the business
rules that cause the replicate nature of id, date, claim_id and flag.

I don't know why you would want to toss away the discriminating variable
'date'. Typically, discriminators are tossed away when creating a rollup
summary.

-----------------------------------------------------------
data foo;
input
ID DATE: date9. CLAIM_ID: $9. CODE_ID FLAG;
format date date9. flag 2.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0
200279098 02Aug2005 A16620006 92081 0
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
run;

proc sql;
create table summary as
select id, sum(flag) as flag_sum
from
( /* sub-query */
select distinct id, date, claim_id, flag
from foo
)
group by
id, date
;
quit;
--
Richard A. DeVenezia
http://www.devenezia.com/
Sigurd Hermansen
2008-01-10 20:37:25 UTC
Permalink
Tom:
A record, obs, row, or tuple of data typically represents a among field,
variable, column, or attribute values, no matter what database model you
are using. DISTINCT (as in SELECT DISTINCT ... or SELECT count(DISTINCT
x), ....) operates on all tuples in a relation. I don't see consistent
and valid rule that a compiler could follow to find (distinct VAR1),
distinct(VAR2), distinct(VAR3)... and still preserve relations of
attributes within tuples. For example, even though this relation has
distinct tuples,
x y
1 2
2 2

it has no reduced form that has the same information and distinct
values per attribute. What would distinct functions yield given
something as simple as this trivial relation?
S

-----Original Message-----
From: owner-sas-***@listserv.uga.edu [mailto:owner-sas-***@listserv.uga.edu]
On Behalf Of Tom White
Sent: Thursday, January 10, 2008 12:55 PM
To: sas-***@listserv.uga.edu
Subject: PROC SQL--select DISTINCT


Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am
attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money
data _null_,
2008-01-10 21:13:50 UTC
Permalink
I believe this gives the correct result. It uses 2 PROC SUMMARY's
which you may find objectionable.


data work.claim;
input ID:$10. DATE:date9. CLAIM_ID:$10. CODE_ID:$10. FLAG;
format date date9.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0

200279098 02Aug2005 A16620006 92081 0

200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1

200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
;;;;
run;
proc summary data=work.claim nway;
class id date claim_id code_id;
output out=work.claim0(drop=_:) idgroup(max(flag) out(flag)=);
run;
*proc print;
run;
proc summary data=work.claim0 nway;
class id date;
output out=work.claim1(drop=_:) sum(flag)=sum_flag;
run;
proc print;
run;
Post by Tom White
Perhaps I am making this too difficult.
Given this dataset
ID DATE CLAIM_ID CODE_ID FLAG
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0
200279098 02Aug2005 A16620006 92081 0
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
I would like to sum up the FLAG field within each ID (in this case I only
For ID=200279098 and DATE=01Aug2005 I see two claims (CLAIM_ID)
which are the same. (Note that CODE_ID are different).
Since FLAG=0 for both claims, sum of FLAG= 0 + 0 =0.
For ID=200279098 and DATE=02Aug2005 I see only one claim so there is nothing going on here.
The sum of FLAG is 0.
For ID=200279098 and DATE=06Sep2005 I see four entries all of which are identical.
In this case the sum of FLAG will be 1. (Not 4.)
For ID=200279098 and DATE=01Nov2005 I see six entries, four of which are identical,
in which case sum of FLAG=1 and two remaining entries where FLAG=0 and therefore sum FLAG=0.
Their total sum is SUM_FLAG=1+0=1.
So, the desired table will look like the one below keeping only ID and SUM_FLAG as the final fields of interest.
ID SUM_FLAG
200279098 0
200279098 0
200279098 1
200279098 1
Perhaps there is an easy way to do this.
Thnak you.
----- Original Message -----
From: "toby dunn"
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:35:21 +0000
Tom ,
Well when you grab the distinct values you now loose any linking
back to the other 2 variables, perhaps this will head you in the
Proc SQL ;
Create Table Distinct As
Select Distinct Var1 , Var2 , Var3
From Have ;
Create Table Need As
Select A.* , B.Var4 , B.Var5
From Distinct As A ,
Have As B ;
Quit ;
Proc SQL ;
Create Table Need As
Select A.Var1 , B.Var2 , C.Var3 , D.Var4 , D.Var5
From
( Select Distinct Var1 From Have ) As A ,
( Select Distinct Var2 From Have ) As B ,
( Select Distinct Var3 From Have ) As C ,
( Select Var4 , Var5 From Have ) As D
;
Quit ;
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 13:20:33 -0500
Subject: Re: PROC SQL--select DISTINCT
Toby,
What I am attempting to do is to select DISTINCT values for VAR1,
VAR3, and VAR3
but all values for VAR4 and VAR5. The data summation problem I am
trying to solve maybe requires something like this.
T
----- Original Message -----
From: "toby dunn"
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:12:22 +0000
Tom ,
Im not sure what you are trying to end up with. Your code and what
I gather from your write up is basically after distinct values of
soome variables but non distinct values of others. The structure
doesnt makes sense, in order to get the distinct values you will
have to eleminate some of the values from the non distinct values,
in order to get the non distinct values you have to make the
distinct values non distinct.
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Make distant family not so distant with Windows Vista(R) + Windows Live™.
http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
Sigurd Hermansen
2008-01-10 21:35:02 UTC
Permalink
Tom:
This solution includes the Date (needed to distinguish rows):
proc sql;
create table test as select ID,Date,case when sum(flag)>1 then 1 else
sum(flag) end as flag
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag
from claim group by ID,Date,Claim_ID,Code_ID
)
group by ID,Date
;
quit;

-----Original Message-----
From: owner-sas-***@listserv.uga.edu [mailto:owner-sas-***@listserv.uga.edu]
On Behalf Of Tom White
Sent: Thursday, January 10, 2008 12:55 PM
To: sas-***@listserv.uga.edu
Subject: PROC SQL--select DISTINCT


Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am
attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money
Tom White
2008-01-10 21:47:51 UTC
Permalink
Thank you data _null_. This seems to solve my problem. I just couldn't get anywehere with SQL and DISTINCT.

Also thank you to the the rest of the responders whose ideas I will very likely use very soon.

T



----- Original Message -----
From: "data _null_,"
To: "Tom White"
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 15:13:50 -0600


I believe this gives the correct result. It uses 2 PROC SUMMARY's
which you may find objectionable.


data work.claim;
input ID:$10. DATE:date9. CLAIM_ID:$10. CODE_ID:$10. FLAG;
format date date9.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0

200279098 02Aug2005 A16620006 92081 0

200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1

200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
;;;;
run;
proc summary data=work.claim nway;
class id date claim_id code_id;
output out=work.claim0(drop=_:) idgroup(max(flag) out(flag)=);
run;
*proc print;
run;
proc summary data=work.claim0 nway;
class id date;
output out=work.claim1(drop=_:) sum(flag)=sum_flag;
run;
proc print;
run;
Post by Tom White
Perhaps I am making this too difficult.
Given this dataset
ID DATE CLAIM_ID CODE_ID FLAG
200279098 01Aug2005 A16620005
92285 0
200279098 01Aug2005 A16620005
99243 0
200279098 02Aug2005 A16620006
92081 0
200279098 06Sep2005 A16880748
15823 1
200279098 06Sep2005 A16880748
15823 1
200279098 06Sep2005 A16880748
15823 1
200279098 06Sep2005 A16880748
15823 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
92285 0
200279098 01Nov2005 A17313211
92285 0
I would like to sum up the FLAG field within each ID (in this case I only
For ID=200279098 and DATE=01Aug2005 I see two claims (CLAIM_ID)
which are the same. (Note that CODE_ID are different).
Since FLAG=0 for both claims, sum of FLAG= 0 + 0 =0.
For ID=200279098 and DATE=02Aug2005 I see only one claim so there
is nothing going on here.
The sum of FLAG is 0.
For ID=200279098 and DATE=06Sep2005 I see four entries all of
which are identical.
In this case the sum of FLAG will be 1. (Not 4.)
For ID=200279098 and DATE=01Nov2005 I see six entries, four of
which are identical,
in which case sum of FLAG=1 and two remaining entries where
FLAG=0 and therefore sum FLAG=0.
Their total sum is SUM_FLAG=1+0=1.
So, the desired table will look like the one below keeping only
ID and SUM_FLAG as the final fields of interest.
ID SUM_FLAG
200279098 0
200279098 0
200279098 1
200279098 1
Perhaps there is an easy way to do this.
Thnak you.
----- Original Message -----
From: "toby dunn"
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:35:21 +0000
Tom ,
Well when you grab the distinct values you now loose any linking
back to the other 2 variables, perhaps this will head you in the
Proc SQL ;
Create Table Distinct As
Select Distinct Var1 , Var2 , Var3
From Have ;
Create Table Need As
Select A.* , B.Var4 , B.Var5
From Distinct As A ,
Have As B ;
Quit ;
Proc SQL ;
Create Table Need As
Select A.Var1 , B.Var2 , C.Var3 , D.Var4 , D.Var5
From
( Select Distinct Var1 From Have ) As A ,
( Select Distinct Var2 From Have ) As B ,
( Select Distinct Var3 From Have ) As C ,
( Select Var4 , Var5 From Have ) As D
;
Quit ;
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 13:20:33 -0500
Subject: Re: PROC SQL--select DISTINCT
Toby,
What I am attempting to do is to select DISTINCT values for VAR1,
VAR3, and VAR3
but all values for VAR4 and VAR5. The data summation problem I am
trying to solve maybe requires something like this.
T
----- Original Message -----
From: "toby dunn"
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:12:22 +0000
Tom ,
Im not sure what you are trying to end up with. Your code and what
I gather from your write up is basically after distinct values of
soome variables but non distinct values of others. The structure
doesnt makes sense, in order to get the distinct values you will
have to eleminate some of the values from the non distinct values,
in order to get the non distinct values you have to make the
distinct values non distinct.
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Make distant family not so distant with Windows Vista(R) + Windows Live™.
http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
Tom White
2008-01-10 21:58:57 UTC
Permalink
Thank you Sig !

This is the convoluted SQL I was trying to do.

I was looking at Richard's code and I was going to try to implement it.

Also data _Null_ solution works just fine for me as well.

Today I learned new SQL syntax:
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag

This looks like a sub-query syntax?

Thank you all.




----- Original Message -----
From: "Sigurd Hermansen"
To: "Tom White" , sas-***@listserv.uga.edu
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 16:35:02 -0500


Tom:
This solution includes the Date (needed to distinguish rows):
proc sql;
create table test as select ID,Date,case when sum(flag)>1 then 1 else
sum(flag) end as flag
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag
from claim group by ID,Date,Claim_ID,Code_ID
)
group by ID,Date
;
quit;

-----Original Message-----
From: owner-sas-***@listserv.uga.edu [mailto:owner-sas-***@listserv.uga.edu]
On Behalf Of Tom White
Sent: Thursday, January 10, 2008 12:55 PM
To: sas-***@listserv.uga.edu
Subject: PROC SQL--select DISTINCT


Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am
attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money


--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
Sigurd Hermansen
2008-01-10 22:26:49 UTC
Permalink
Tom:
So it is ... Actually the sub-query reads "(select
ID,Date,Claim_ID,CODE_ID,max(flag) as flag from claim group by
ID,Date,Claim_ID,Code_ID)". But, inside the parentheses, it's also basic
SQL query syntax. Thanks to the closure property of relational database
programming, every query and nested query at every level yields a
relation, and only a relation, so SQL and other relational database
languages support only one data structure. Of course a relational
variable (relvar) can have types that amount to degenerate relations
(the relational logic kind, not the family ones) such as a tuple (one
row), an attribute (one column), a tuple-attribute (cell), or an empty
relation. Probably more than you want to know at this point, but you may
need to know more about these finer points later.
S

-----Original Message-----
From: Tom White [mailto:***@mail.com]
Sent: Thursday, January 10, 2008 4:59 PM
To: Sigurd Hermansen; Tom White; sas-***@listserv.uga.edu
Subject: RE: PROC SQL--select DISTINCT


Thank you Sig !

This is the convoluted SQL I was trying to do.

I was looking at Richard's code and I was going to try to implement it.

Also data _Null_ solution works just fine for me as well.

Today I learned new SQL syntax:
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag

This looks like a sub-query syntax?

Thank you all.




----- Original Message -----
From: "Sigurd Hermansen"
To: "Tom White" , sas-***@listserv.uga.edu
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 16:35:02 -0500


Tom:
This solution includes the Date (needed to distinguish rows): proc sql;
create table test as select ID,Date,case when sum(flag)>1 then 1 else
sum(flag) end as flag
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag
from claim group by ID,Date,Claim_ID,Code_ID
)
group by ID,Date
;
quit;

-----Original Message-----
From: owner-sas-***@listserv.uga.edu [mailto:owner-sas-***@listserv.uga.edu]
On Behalf Of Tom White
Sent: Thursday, January 10, 2008 12:55 PM
To: sas-***@listserv.uga.edu
Subject: PROC SQL--select DISTINCT


Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am
attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money


--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money
Richard Read Allen
2008-01-10 22:31:06 UTC
Permalink
Tom,

Here a possible SQL solution.

Richard

==================================================
data work.claim;
input ID:$10. DATE:date9. CLAIM_ID:$10. CODE_ID:$10. FLAG;
format date date9.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0

200279098 02Aug2005 A16620006 92081 0

200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1

200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
;;;;
run;

proc sql;
create table claim1 as
select ID, Date, Claim_ID, max(flag) as sum_flag
from claim
group by ID, Date, Claim_ID;
quit;

proc print;
run;
Post by Tom White
Thank you data _null_. This seems to solve my problem. I just couldn't get anywehere with SQL and DISTINCT.
Also thank you to the the rest of the responders whose ideas I will very likely use very soon.
T
----- Original Message -----
From: "data _null_,"
To: "Tom White"
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 15:13:50 -0600
I believe this gives the correct result. It uses 2 PROC SUMMARY's
which you may find objectionable.
data work.claim;
input ID:$10. DATE:date9. CLAIM_ID:$10. CODE_ID:$10. FLAG;
format date date9.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0
200279098 02Aug2005 A16620006 92081 0
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
;;;;
run;
proc summary data=work.claim nway;
class id date claim_id code_id;
output out=work.claim0(drop=_:) idgroup(max(flag) out(flag)=);
run;
*proc print;
run;
proc summary data=work.claim0 nway;
class id date;
output out=work.claim1(drop=_:) sum(flag)=sum_flag;
run;
proc print;
run;
Post by Tom White
Perhaps I am making this too difficult.
Given this dataset
ID DATE CLAIM_ID CODE_ID FLAG
200279098 01Aug2005 A16620005
92285 0
200279098 01Aug2005 A16620005
99243 0
200279098 02Aug2005 A16620006
92081 0
200279098 06Sep2005 A16880748
15823 1
200279098 06Sep2005 A16880748
15823 1
200279098 06Sep2005 A16880748
15823 1
200279098 06Sep2005 A16880748
15823 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
92285 0
200279098 01Nov2005 A17313211
92285 0
I would like to sum up the FLAG field within each ID (in this case I only
For ID=200279098 and DATE=01Aug2005 I see two claims (CLAIM_ID)
which are the same. (Note that CODE_ID are different).
Since FLAG=0 for both claims, sum of FLAG= 0 + 0 =0.
For ID=200279098 and DATE=02Aug2005 I see only one claim so there
is nothing going on here.
The sum of FLAG is 0.
For ID=200279098 and DATE=06Sep2005 I see four entries all of
which are identical.
In this case the sum of FLAG will be 1. (Not 4.)
For ID=200279098 and DATE=01Nov2005 I see six entries, four of
which are identical,
in which case sum of FLAG=1 and two remaining entries where
FLAG=0 and therefore sum FLAG=0.
Their total sum is SUM_FLAG=1+0=1.
So, the desired table will look like the one below keeping only
ID and SUM_FLAG as the final fields of interest.
ID SUM_FLAG
200279098 0
200279098 0
200279098 1
200279098 1
Perhaps there is an easy way to do this.
Thnak you.
----- Original Message -----
From: "toby dunn"
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:35:21 +0000
Tom ,
Well when you grab the distinct values you now loose any linking
back to the other 2 variables, perhaps this will head you in the
Proc SQL ;
Create Table Distinct As
Select Distinct Var1 , Var2 , Var3
From Have ;
Create Table Need As
Select A.* , B.Var4 , B.Var5
From Distinct As A ,
Have As B ;
Quit ;
Proc SQL ;
Create Table Need As
Select A.Var1 , B.Var2 , C.Var3 , D.Var4 , D.Var5
From
( Select Distinct Var1 From Have ) As A ,
( Select Distinct Var2 From Have ) As B ,
( Select Distinct Var3 From Have ) As C ,
( Select Var4 , Var5 From Have ) As D
;
Quit ;
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 13:20:33 -0500
Subject: Re: PROC SQL--select DISTINCT
Toby,
What I am attempting to do is to select DISTINCT values for VAR1,
VAR3, and VAR3
but all values for VAR4 and VAR5. The data summation problem I am
trying to solve maybe requires something like this.
T
----- Original Message -----
From: "toby dunn"
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:12:22 +0000
Tom ,
Im not sure what you are trying to end up with. Your code and what
I gather from your write up is basically after distinct values of
soome variables but non distinct values of others. The structure
doesnt makes sense, in order to get the distinct values you will
have to eleminate some of the values from the non distinct values,
in order to get the non distinct values you have to make the
distinct values non distinct.
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Make distant family not so distant with Windows Vista(R) + Windows Live™.
http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
Tom White
2008-01-10 23:21:56 UTC
Permalink
As always, Thank you.
T

----- Original Message -----
From: "Sigurd Hermansen"
To: SAS-***@LISTSERV.UGA.EDU
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 17:26:49 -0500


Tom:
So it is ... Actually the sub-query reads "(select
ID,Date,Claim_ID,CODE_ID,max(flag) as flag from claim group by
ID,Date,Claim_ID,Code_ID)". But, inside the parentheses, it's also basic
SQL query syntax. Thanks to the closure property of relational database
programming, every query and nested query at every level yields a
relation, and only a relation, so SQL and other relational database
languages support only one data structure. Of course a relational
variable (relvar) can have types that amount to degenerate relations
(the relational logic kind, not the family ones) such as a tuple (one
row), an attribute (one column), a tuple-attribute (cell), or an empty
relation. Probably more than you want to know at this point, but you may
need to know more about these finer points later.
S

-----Original Message-----
From: Tom White [mailto:***@mail.com]
Sent: Thursday, January 10, 2008 4:59 PM
To: Sigurd Hermansen; Tom White; sas-***@listserv.uga.edu
Subject: RE: PROC SQL--select DISTINCT


Thank you Sig !

This is the convoluted SQL I was trying to do.

I was looking at Richard's code and I was going to try to implement it.

Also data _Null_ solution works just fine for me as well.

Today I learned new SQL syntax:
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag

This looks like a sub-query syntax?

Thank you all.




----- Original Message -----
From: "Sigurd Hermansen"
To: "Tom White" , sas-***@listserv.uga.edu
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 16:35:02 -0500


Tom:
This solution includes the Date (needed to distinguish rows): proc sql;
create table test as select ID,Date,case when sum(flag)>1 then 1 else
sum(flag) end as flag
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag
from claim group by ID,Date,Claim_ID,Code_ID
)
group by ID,Date
;
quit;

-----Original Message-----
From: owner-sas-***@listserv.uga.edu [mailto:owner-sas-***@listserv.uga.edu]
On Behalf Of Tom White
Sent: Thursday, January 10, 2008 12:55 PM
To: sas-***@listserv.uga.edu
Subject: PROC SQL--select DISTINCT


Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am
attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money


--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money


--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
b***@CLA.CO.UK
2008-01-11 10:39:41 UTC
Permalink
Now the problem has been solved the quick answer to your question is you
can only use the DISTINCT keyword once in a given proc sql statement.

So you would never write
Post by Tom White
select(distinct VAR1), distinct(VAR2), distinct(VAR3)
For exampple:

data x;
do i=1 to 100000;
a=round(ranuni(i)*100);
b=round(ranuni(i)*100);
c=round(ranuni(i)*100);
d=round(ranuni(i)*100);
output;
end;
drop i;
run;

proc sql;

create table dedupe as
select distinct a, b, c, d
from x;

create table dupes as
select a, b, c, d, count(*) as qty
from x
group by a, b, c, d
having qty>1;

quit;


HTH.
Post by Tom White
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Post by Tom White
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
toby dunn
2008-01-11 14:28:20 UTC
Permalink
Ben ,

Technically no you can have more than one Distinct, however it does matter the context in which you use it in:

Data Have ;
Infile Cards ;
Input X Y ;
Cards ;
1 1
1 2
1 1
1 2
2 1
2 2
2 3
;
Run ;


Proc SQL ;
Create Table Need As
Select Distinct X , Count( Distinct Y ) As MyCnt
From Have
Group By X ;
Quit ;


Proc Print
Data = Need ;
Run ;


Toby Dunn

"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch

"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Fri, 11 Jan 2008 05:39:41 -0500
Subject: Re: PROC SQL--select DISTINCT
Now the problem has been solved the quick answer to your question is you
can only use the DISTINCT keyword once in a given proc sql statement.
So you would never write
Post by Tom White
select(distinct VAR1), distinct(VAR2), distinct(VAR3)
data x;
do i=1 to 100000;
a=round(ranuni(i)*100);
b=round(ranuni(i)*100);
c=round(ranuni(i)*100);
d=round(ranuni(i)*100);
output;
end;
drop i;
run;
proc sql;
create table dedupe as
select distinct a, b, c, d
from x;
create table dupes as
select a, b, c, d, count(*) as qty
from x
group by a, b, c, d
having qty>1;
quit;
HTH.
Post by Tom White
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Post by Tom White
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Put your friends on the big screen with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_MediaCtr_bigscreen_012008
Ben Powell
2008-01-11 14:35:33 UTC
Permalink
Hi Toby, good call as ever.

Rgds

-----Original Message-----
From: toby dunn [mailto:***@hotmail.com]
Sent: 11 January 2008 14:28
To: Ben Powell; sas-***@listserv.uga.edu
Subject: RE: PROC SQL--select DISTINCT


Ben ,

Technically no you can have more than one Distinct, however it does
matter the context in which you use it in:

Data Have ;
Infile Cards ;
Input X Y ;
Cards ;
1 1
1 2
1 1
1 2
2 1
2 2
2 3
;
Run ;


Proc SQL ;
Create Table Need As
Select Distinct X , Count( Distinct Y ) As MyCnt
From Have
Group By X ;
Quit ;


Proc Print
Data = Need ;
Run ;


Toby Dunn

"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch

"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch
Date: Fri, 11 Jan 2008 05:39:41 -0500
Subject: Re: PROC SQL--select DISTINCT
Now the problem has been solved the quick answer to your question is
you can only use the DISTINCT keyword once in a given proc sql
statement.
So you would never write
Post by Tom White
select(distinct VAR1), distinct(VAR2), distinct(VAR3)
data x;
do i=1 to 100000;
a=round(ranuni(i)*100);
b=round(ranuni(i)*100);
c=round(ranuni(i)*100);
d=round(ranuni(i)*100);
output;
end;
drop i;
run;
proc sql;
create table dedupe as
select distinct a, b, c, d
from x;
create table dupes as
select a, b, c, d, count(*) as qty
from x
group by a, b, c, d
having qty>1;
quit;
HTH.
Post by Tom White
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3), VAR4, VAR5
from FOO1 group by VAR1, VAR2, VAR3, VAR4, VAR5; quit;
SAS does not recognize the distinct function. In other words, I cannot
write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3), .....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Post by Tom White
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money
_________________________________________________________________
Put your friends on the big screen with Windows Vista(r) + Windows
Live(tm).
http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_
CPC_MediaCtr_bigscreen_012008

****************************************************************************
THE COPYRIGHT LICENSING AGENCY LIMITED
Registered Office:
SAFFRON HOUSE
6-10 KIRBY STREET
LONDON
EC1N 8TS
Company No. 1690026 (registered in England)

The contents of this email and any attachments are confidential to the intended recipient.
They may not be disclosed to, used by or copied in any way by anyone other than the
intended recipient.

Whilst any information and/or any opinion given is believed to be correct, it is not intended
to constitute legal advice; you should seek specific legal advice as appropriate.

Please note that CLA does not accept any responsibility for viruses and it is your
responsibility to scan or otherwise check this email and any attachments.
****************************************************************************
Ian Whitlock
2008-01-11 15:46:13 UTC
Permalink
Summary: Distinct
#iw-value=1

Tom White <***@MAIL.COM> wrote

SAS does not recognize the distinct function. In other words, I
cannot write

..... select(distinct VAR1), distinct(VAR2), distinct(VAR3), .....
quit;

To understand why, consider the data

X Y
-----
1 1
1 2

Which row would you eliminate to honor DISTINCT X? DISTINCT is a
property of the row not the column. In

select count(distinct x), count(distinct y)

DISTINCT is a property of one column which really involves a special
subquery. In other words, the DISTINCTs are distinctly different.

Ian Whitlock
Howard Schreier <hs AT dc-sug DOT org>
2008-01-12 15:00:44 UTC
Permalink
On Thu, 10 Jan 2008 15:31:06 -0700, Richard Read Allen
Post by Richard Read Allen
Tom,
Here a possible SQL solution.
Richard
==================================================
data work.claim;
input ID:$10. DATE:date9. CLAIM_ID:$10. CODE_ID:$10. FLAG;
format date date9.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0
200279098 02Aug2005 A16620006 92081 0
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
;;;;
run;
proc sql;
create table claim1 as
select ID, Date, Claim_ID, max(flag) as sum_flag
from claim
group by ID, Date, Claim_ID;
quit;
proc print;
run;
[snip]

The solution with MAX occurred to me also, but it works only because the
example is a special case. To generalize the example, change 67904 to X_X_X
in one line.

Here is a solution which does in fact make prominent use of DISTINCT:

proc sql;
create table flagsums as
select id, sum(flag) as sum_flag
from (select distinct * from claim)
group by id, date;
quit;

By the way, the target of the outer FROM clause is called an "in-line view"
rather than a "subquery", at least in the SAS documentation. Subquery vs.
in-line view: the syntax is the same, the context is different.

Loading...