PROC SQL--select DISTINCT

Discussion:

(too old to reply)

Tom White

2008-01-10 17:54:44 UTC

Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

Darryl Putnam

2008-01-10 18:34:45 UTC

Permalink

Post by Tom White
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money -http://www.mail.com/Money.aspx?cat=money

I assume you want list of the variable values?

Try this

proc summary data=FOO1 missing nway; /* you may or may not want the
nway option, try it both ways */
class VAR1-VAR5
output out=FOO2;
run;

-Darryl

toby dunn

2008-01-10 18:12:22 UTC

Permalink

Tom ,

Im not sure what you are trying to end up with. Your code and what I gather from your write up is basically after distinct values of soome variables but non distinct values of others. The structure doesnt makes sense, in order to get the distinct values you will have to eleminate some of the values from the non distinct values, in order to get the non distinct values you have to make the distinct values non distinct.

Toby Dunn

"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch

"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch

Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

Terjeson, Mark

2008-01-10 18:15:25 UTC

Permalink

Hi Tom,

Does this get you closer to what you want?

data sample;
do i=1 to 10;
do j = 1 to 5;
do k = 1 to 3;
do m = 1 to 5;
do n = 1 to 3;
output;
end;
end;
end;
end;
end;
run;

* DISTINCT - unique combinations of all variables ;
* i.e. would get rid of duplicate records which ;
* the sample above has no dups. ;
proc sql;
create table result as
select
distinct
*
from
sample
;
quit;

* DISTINCT - unique combinations of most variables ;
* i.e. unique for i-j-k across the m-n ;
proc sql;
create table result as
select
distinct
i,j,k
from
sample
;
quit;

* COUNTs across the distinct unique combinations ;
* i.e. use GROUP BY INSTEAD OF DISTINCT ;
proc sql;
create table result as
select
i,j,k,
count(*) as count
from
sample
group by
i,j,k
;
quit;

Hope this is helpful.

Mark Terjeson
Senior Programmer Analyst, IM&R
Russell Investments

Russell Investments
Global Leaders in Multi-Manager Investing

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-***@LISTSERV.UGA.EDU] On Behalf Of Tom
White
Sent: Thursday, January 10, 2008 9:55 AM
To: SAS-***@LISTSERV.UGA.EDU
Subject: PROC SQL--select DISTINCT

Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am
attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

Tom White

2008-01-10 18:20:33 UTC

Permalink

Toby,

What I am attempting to do is to select DISTINCT values for VAR1, VAR3, and VAR3
but all values for VAR4 and VAR5. The data summation problem I am trying to solve maybe requires something like this.

T

----- Original Message -----
From: "toby dunn"
To: "Tom White" , sas-***@listserv.uga.edu
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:12:22 +0000

Tom ,

Im not sure what you are trying to end up with. Your code and what
I gather from your write up is basically after distinct values of
soome variables but non distinct values of others. The structure
doesnt makes sense, in order to get the distinct values you will
have to eleminate some of the values from the non distinct values,
in order to get the non distinct values you have to make the
distinct values non distinct.

Toby Dunn

"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch

"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch

Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

_________________________________________________________________
Make distant family not so distant with Windows Vista® + Windows Live.
http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008

--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

toby dunn

2008-01-10 18:35:21 UTC

Permalink

Tom ,

Well when you grab the distinct values you now loose any linking back to the other 2 variables, perhaps this will head you in the right direction:

Proc SQL ;

Create Table Distinct As
Select Distinct Var1 , Var2 , Var3
From Have ;

Create Table Need As
Select A.* , B.Var4 , B.Var5
From Distinct As A ,
Have As B ;

Quit ;

If you need distinct values for Var-Var3 seperately then:

Proc SQL ;
Create Table Need As
Select A.Var1 , B.Var2 , C.Var3 , D.Var4 , D.Var5
From
( Select Distinct Var1 From Have ) As A ,
( Select Distinct Var2 From Have ) As B ,
( Select Distinct Var3 From Have ) As C ,
( Select Var4 , Var5 From Have ) As D
;
Quit ;

Toby Dunn

"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch

"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch

Date: Thu, 10 Jan 2008 13:20:33 -0500
Subject: Re: PROC SQL--select DISTINCT
Toby,
What I am attempting to do is to select DISTINCT values for VAR1, VAR3, and VAR3
but all values for VAR4 and VAR5. The data summation problem I am trying to solve maybe requires something like this.
T
----- Original Message -----
From: "toby dunn"
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:12:22 +0000
Tom ,
Im not sure what you are trying to end up with. Your code and what
I gather from your write up is basically after distinct values of
soome variables but non distinct values of others. The structure
doesnt makes sense, in order to get the distinct values you will
have to eleminate some of the values from the non distinct values,
in order to get the non distinct values you have to make the
distinct values non distinct.
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch

Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

_________________________________________________________________
Make distant family not so distant with Windows Vista® + Windows Live.
http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_012008

Richard A. DeVenezia

2008-01-10 20:42:07 UTC

Permalink

Hard to say for sure what you want...

This is one way, if you want every combination of x1,x2,x3 to cross with
every combination of x4,x5

------------------------------------------------
data foo;
do x1 = 1 to 4;
do x2 = 1 to 4;
do x3 = 1 to 4;
do x4 = 1 to 20;
do x5 = 1 to 20;
if ranuni(1234) < 0.01 then output;
end;end;end;end;end;
run;

proc sql;
create table bar as
select x1,x2,x3,x4,x5
from
(select distinct x1,x2,x3 from foo) as one
, (select distinct x4,x5 from foo) as two
;
quit;
------------------------------------------------

Richard A. DeVenezia
http://www.devenezia.com/

Tom White

2008-01-10 20:45:10 UTC

Permalink

Perhaps I am making this too difficult.

Given this dataset

ID DATE CLAIM_ID CODE_ID FLAG
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0

200279098 02Aug2005 A16620006 92081 0

200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1

200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0

I would like to sum up the FLAG field within each ID (in this case I only
show one ID, but in real data I have many such IDs) as follows:

For ID=200279098 and DATE=01Aug2005 I see two claims (CLAIM_ID)
which are the same. (Note that CODE_ID are different).
Since FLAG=0 for both claims, sum of FLAG= 0 + 0 =0.

For ID=200279098 and DATE=02Aug2005 I see only one claim so there is nothing going on here.
The sum of FLAG is 0.

For ID=200279098 and DATE=06Sep2005 I see four entries all of which are identical.
In this case the sum of FLAG will be 1. (Not 4.)

For ID=200279098 and DATE=01Nov2005 I see six entries, four of which are identical,
in which case sum of FLAG=1 and two remaining entries where FLAG=0 and therefore sum FLAG=0.
Their total sum is SUM_FLAG=1+0=1.

So, the desired table will look like the one below keeping only ID and SUM_FLAG as the final fields of interest.

ID SUM_FLAG
200279098 0
200279098 0
200279098 1
200279098 1

Perhaps there is an easy way to do this.

Thnak you.

----- Original Message -----
From: "toby dunn"
To: SAS-***@LISTSERV.UGA.EDU
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:35:21 +0000

Tom ,

Well when you grab the distinct values you now loose any linking
back to the other 2 variables, perhaps this will head you in the
right direction:

Proc SQL ;

Create Table Distinct As
Select Distinct Var1 , Var2 , Var3
From Have ;

Create Table Need As
Select A.* , B.Var4 , B.Var5
From Distinct As A ,
Have As B ;

Quit ;

If you need distinct values for Var-Var3 seperately then:

Proc SQL ;
Create Table Need As
Select A.Var1 , B.Var2 , C.Var3 , D.Var4 , D.Var5
From
( Select Distinct Var1 From Have ) As A ,
( Select Distinct Var2 From Have ) As B ,
( Select Distinct Var3 From Have ) As C ,
( Select Var4 , Var5 From Have ) As D
;
Quit ;

Toby Dunn

"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch

"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch

Date: Thu, 10 Jan 2008 13:20:33 -0500
Subject: Re: PROC SQL--select DISTINCT
Toby,
What I am attempting to do is to select DISTINCT values for VAR1,
VAR3, and VAR3
but all values for VAR4 and VAR5. The data summation problem I am
trying to solve maybe requires something like this.
T
----- Original Message -----
From: "toby dunn"
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:12:22 +0000
Tom ,
Im not sure what you are trying to end up with. Your code and what
I gather from your write up is basically after distinct values of
soome variables but non distinct values of others. The structure
doesnt makes sense, in order to get the distinct values you will
have to eleminate some of the values from the non distinct values,
in order to get the non distinct values you have to make the
distinct values non distinct.
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch

Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

_________________________________________________________________
Make distant family not so distant with Windows Vista® + Windows Live.
http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_012008

--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

Richard A. DeVenezia

2008-01-11 04:42:46 UTC

Permalink

Post by Tom White
Perhaps I am making this too difficult.
Given this dataset ...

You need a sub-query to obtain the rows that will be used in the
summarization. The sub-query is making tacit assumptions about the business
rules that cause the replicate nature of id, date, claim_id and flag.

I don't know why you would want to toss away the discriminating variable
'date'. Typically, discriminators are tossed away when creating a rollup
summary.

-----------------------------------------------------------
data foo;
input
ID DATE: date9. CLAIM_ID: $9. CODE_ID FLAG;
format date date9. flag 2.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0
200279098 02Aug2005 A16620006 92081 0
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
run;

proc sql;
create table summary as
select id, sum(flag) as flag_sum
from
( /* sub-query */
select distinct id, date, claim_id, flag
from foo
)
group by
id, date
;
quit;

--
Richard A. DeVenezia
http://www.devenezia.com/

Sigurd Hermansen

2008-01-10 20:37:25 UTC

Permalink

Tom:
A record, obs, row, or tuple of data typically represents a among field,
variable, column, or attribute values, no matter what database model you
are using. DISTINCT (as in SELECT DISTINCT ... or SELECT count(DISTINCT
x), ....) operates on all tuples in a relation. I don't see consistent
and valid rule that a compiler could follow to find (distinct VAR1),
distinct(VAR2), distinct(VAR3)... and still preserve relations of
attributes within tuples. For example, even though this relation has
distinct tuples,
x y
1 2
2 2

it has no reduced form that has the same information and distinct
values per attribute. What would distinct functions yield given
something as simple as this trivial relation?
S

-----Original Message-----
From: owner-sas-***@listserv.uga.edu [mailto:owner-sas-***@listserv.uga.edu]
On Behalf Of Tom White
Sent: Thursday, January 10, 2008 12:55 PM
To: sas-***@listserv.uga.edu
Subject: PROC SQL--select DISTINCT

Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am
attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money

data _null_,

2008-01-10 21:13:50 UTC

Permalink

I believe this gives the correct result. It uses 2 PROC SUMMARY's
which you may find objectionable.

data work.claim;
input ID:$10. DATE:date9. CLAIM_ID:$10. CODE_ID:$10. FLAG;
format date date9.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0

200279098 02Aug2005 A16620006 92081 0

200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1

200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
;;;;
run;
proc summary data=work.claim nway;
class id date claim_id code_id;
output out=work.claim0(drop=_:) idgroup(max(flag) out(flag)=);
run;
*proc print;
run;
proc summary data=work.claim0 nway;
class id date;
output out=work.claim1(drop=_:) sum(flag)=sum_flag;
run;
proc print;
run;

Post by Tom White
Perhaps I am making this too difficult.
Given this dataset
ID DATE CLAIM_ID CODE_ID FLAG
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0
200279098 02Aug2005 A16620006 92081 0
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
I would like to sum up the FLAG field within each ID (in this case I only
For ID=200279098 and DATE=01Aug2005 I see two claims (CLAIM_ID)
which are the same. (Note that CODE_ID are different).
Since FLAG=0 for both claims, sum of FLAG= 0 + 0 =0.
For ID=200279098 and DATE=02Aug2005 I see only one claim so there is nothing going on here.
The sum of FLAG is 0.
For ID=200279098 and DATE=06Sep2005 I see four entries all of which are identical.
In this case the sum of FLAG will be 1. (Not 4.)
For ID=200279098 and DATE=01Nov2005 I see six entries, four of which are identical,
in which case sum of FLAG=1 and two remaining entries where FLAG=0 and therefore sum FLAG=0.
Their total sum is SUM_FLAG=1+0=1.
So, the desired table will look like the one below keeping only ID and SUM_FLAG as the final fields of interest.
ID SUM_FLAG
200279098 0
200279098 0
200279098 1
200279098 1
Perhaps there is an easy way to do this.
Thnak you.
----- Original Message -----
From: "toby dunn"
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:35:21 +0000
Tom ,
Well when you grab the distinct values you now loose any linking
back to the other 2 variables, perhaps this will head you in the
Proc SQL ;
Create Table Distinct As
Select Distinct Var1 , Var2 , Var3
From Have ;
Create Table Need As
Select A.* , B.Var4 , B.Var5
From Distinct As A ,
Have As B ;
Quit ;
Proc SQL ;
Create Table Need As
Select A.Var1 , B.Var2 , C.Var3 , D.Var4 , D.Var5
From
( Select Distinct Var1 From Have ) As A ,
( Select Distinct Var2 From Have ) As B ,
( Select Distinct Var3 From Have ) As C ,
( Select Var4 , Var5 From Have ) As D
;
Quit ;
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch

Date: Thu, 10 Jan 2008 13:20:33 -0500
Subject: Re: PROC SQL--select DISTINCT
Toby,
What I am attempting to do is to select DISTINCT values for VAR1,
VAR3, and VAR3
but all values for VAR4 and VAR5. The data summation problem I am
trying to solve maybe requires something like this.
T
----- Original Message -----
From: "toby dunn"
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:12:22 +0000
Tom ,
Im not sure what you are trying to end up with. Your code and what
I gather from your write up is basically after distinct values of
soome variables but non distinct values of others. The structure
doesnt makes sense, in order to get the distinct values you will
have to eleminate some of the values from the non distinct values,
in order to get the non distinct values you have to make the
distinct values non distinct.
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch

Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

_________________________________________________________________
Make distant family not so distant with Windows Vista(R) + Windows Live.
http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

Sigurd Hermansen

2008-01-10 21:35:02 UTC

Permalink

Tom:
This solution includes the Date (needed to distinguish rows):
proc sql;
create table test as select ID,Date,case when sum(flag)>1 then 1 else
sum(flag) end as flag
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag
from claim group by ID,Date,Claim_ID,Code_ID
)
group by ID,Date
;
quit;

-----Original Message-----
From: owner-sas-***@listserv.uga.edu [mailto:owner-sas-***@listserv.uga.edu]
On Behalf Of Tom White
Sent: Thursday, January 10, 2008 12:55 PM
To: sas-***@listserv.uga.edu
Subject: PROC SQL--select DISTINCT

Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am
attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money

Tom White

2008-01-10 21:47:51 UTC

Permalink

Thank you data _null_. This seems to solve my problem. I just couldn't get anywehere with SQL and DISTINCT.

Also thank you to the the rest of the responders whose ideas I will very likely use very soon.

T

----- Original Message -----
From: "data _null_,"
To: "Tom White"
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 15:13:50 -0600

I believe this gives the correct result. It uses 2 PROC SUMMARY's
which you may find objectionable.

data work.claim;
input ID:$10. DATE:date9. CLAIM_ID:$10. CODE_ID:$10. FLAG;
format date date9.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0

200279098 02Aug2005 A16620006 92081 0

200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1

200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
;;;;
run;
proc summary data=work.claim nway;
class id date claim_id code_id;
output out=work.claim0(drop=_:) idgroup(max(flag) out(flag)=);
run;
*proc print;
run;
proc summary data=work.claim0 nway;
class id date;
output out=work.claim1(drop=_:) sum(flag)=sum_flag;
run;
proc print;
run;

Post by Tom White
Perhaps I am making this too difficult.
Given this dataset
ID DATE CLAIM_ID CODE_ID FLAG
200279098 01Aug2005 A16620005
92285 0
200279098 01Aug2005 A16620005
99243 0
200279098 02Aug2005 A16620006
92081 0
200279098 06Sep2005 A16880748
15823 1
200279098 06Sep2005 A16880748
15823 1
200279098 06Sep2005 A16880748
15823 1
200279098 06Sep2005 A16880748
15823 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
67904 1
200279098 01Nov2005 A17313211
92285 0
200279098 01Nov2005 A17313211
92285 0
I would like to sum up the FLAG field within each ID (in this case I only
For ID=200279098 and DATE=01Aug2005 I see two claims (CLAIM_ID)
which are the same. (Note that CODE_ID are different).
Since FLAG=0 for both claims, sum of FLAG= 0 + 0 =0.
For ID=200279098 and DATE=02Aug2005 I see only one claim so there
is nothing going on here.
The sum of FLAG is 0.
For ID=200279098 and DATE=06Sep2005 I see four entries all of
which are identical.
In this case the sum of FLAG will be 1. (Not 4.)
For ID=200279098 and DATE=01Nov2005 I see six entries, four of
which are identical,
in which case sum of FLAG=1 and two remaining entries where
FLAG=0 and therefore sum FLAG=0.
Their total sum is SUM_FLAG=1+0=1.
So, the desired table will look like the one below keeping only
ID and SUM_FLAG as the final fields of interest.
ID SUM_FLAG
200279098 0
200279098 0
200279098 1
200279098 1
Perhaps there is an easy way to do this.
Thnak you.
----- Original Message -----
From: "toby dunn"
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:35:21 +0000
Tom ,
Well when you grab the distinct values you now loose any linking
back to the other 2 variables, perhaps this will head you in the
Proc SQL ;
Create Table Distinct As
Select Distinct Var1 , Var2 , Var3
From Have ;
Create Table Need As
Select A.* , B.Var4 , B.Var5
From Distinct As A ,
Have As B ;
Quit ;
Proc SQL ;
Create Table Need As
Select A.Var1 , B.Var2 , C.Var3 , D.Var4 , D.Var5
From
( Select Distinct Var1 From Have ) As A ,
( Select Distinct Var2 From Have ) As B ,
( Select Distinct Var3 From Have ) As C ,
( Select Var4 , Var5 From Have ) As D
;
Quit ;
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch

Date: Thu, 10 Jan 2008 13:20:33 -0500
Subject: Re: PROC SQL--select DISTINCT
Toby,
What I am attempting to do is to select DISTINCT values for VAR1,
VAR3, and VAR3
but all values for VAR4 and VAR5. The data summation problem I am
trying to solve maybe requires something like this.
T
----- Original Message -----
From: "toby dunn"
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 18:12:22 +0000
Tom ,
Im not sure what you are trying to end up with. Your code and what
I gather from your write up is basically after distinct values of
soome variables but non distinct values of others. The structure
doesnt makes sense, in order to get the distinct values you will
have to eleminate some of the values from the non distinct values,
in order to get the non distinct values you have to make the
distinct values non distinct.
Toby Dunn
"Don't bail. The best gold is at the bottom of barrels of crap."
Randy Pausch
"Be prepared. Luck is where preparation meets opportunity."
Randy Pausch

Date: Thu, 10 Jan 2008 12:54:44 -0500
Subject: PROC SQL--select DISTINCT
Hello SAS-L
I have a SAS data set with many fields (variables).
Suppose I would like to write sql code like
proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;
SAS does not recognize the distinct function. In other words,
I cannot write
.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;
Is there any way to select multiple DISTINCT variables like I am
attempting to do above?
Thank you.
T
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

_________________________________________________________________
Make distant family not so distant with Windows Vista(R) + Windows Live.

http://www.microsoft.com/windows/digitallife/keepintouch.mspx?ocid=TXT_TAGLM_CPC_VideoChat_distantfamily_012008

--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_012008
--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

Tom White

2008-01-10 21:58:57 UTC

Permalink

Thank you Sig !

This is the convoluted SQL I was trying to do.

I was looking at Richard's code and I was going to try to implement it.

Also data _Null_ solution works just fine for me as well.

Today I learned new SQL syntax:
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag

This looks like a sub-query syntax?

Thank you all.

----- Original Message -----
From: "Sigurd Hermansen"
To: "Tom White" , sas-***@listserv.uga.edu
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 16:35:02 -0500

Tom:
This solution includes the Date (needed to distinguish rows):
proc sql;
create table test as select ID,Date,case when sum(flag)>1 then 1 else
sum(flag) end as flag
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag
from claim group by ID,Date,Claim_ID,Code_ID
)
group by ID,Date
;
quit;

-----Original Message-----
From: owner-sas-***@listserv.uga.edu [mailto:owner-sas-***@listserv.uga.edu]
On Behalf Of Tom White
Sent: Thursday, January 10, 2008 12:55 PM
To: sas-***@listserv.uga.edu
Subject: PROC SQL--select DISTINCT

Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am
attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money

--
Are we headed for a recession? Read more on the Money Portal
Mail.com Money - http://www.mail.com/Money.aspx?cat=money

Sigurd Hermansen

2008-01-10 22:26:49 UTC

Permalink

Tom:
So it is ... Actually the sub-query reads "(select
ID,Date,Claim_ID,CODE_ID,max(flag) as flag from claim group by
ID,Date,Claim_ID,Code_ID)". But, inside the parentheses, it's also basic
SQL query syntax. Thanks to the closure property of relational database
programming, every query and nested query at every level yields a
relation, and only a relation, so SQL and other relational database
languages support only one data structure. Of course a relational
variable (relvar) can have types that amount to degenerate relations
(the relational logic kind, not the family ones) such as a tuple (one
row), an attribute (one column), a tuple-attribute (cell), or an empty
relation. Probably more than you want to know at this point, but you may
need to know more about these finer points later.
S

-----Original Message-----
From: Tom White [mailto:***@mail.com]
Sent: Thursday, January 10, 2008 4:59 PM
To: Sigurd Hermansen; Tom White; sas-***@listserv.uga.edu
Subject: RE: PROC SQL--select DISTINCT

Thank you Sig !

This is the convoluted SQL I was trying to do.

I was looking at Richard's code and I was going to try to implement it.

Also data _Null_ solution works just fine for me as well.

Today I learned new SQL syntax:
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag

This looks like a sub-query syntax?

Thank you all.

----- Original Message -----
From: "Sigurd Hermansen"
To: "Tom White" , sas-***@listserv.uga.edu
Subject: RE: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 16:35:02 -0500

Tom:
This solution includes the Date (needed to distinguish rows): proc sql;
create table test as select ID,Date,case when sum(flag)>1 then 1 else
sum(flag) end as flag
from (select ID,Date,Claim_ID,CODE_ID,max(flag) as flag
from claim group by ID,Date,Claim_ID,Code_ID
)
group by ID,Date
;
quit;

-----Original Message-----
From: owner-sas-***@listserv.uga.edu [mailto:owner-sas-***@listserv.uga.edu]
On Behalf Of Tom White
Sent: Thursday, January 10, 2008 12:55 PM
To: sas-***@listserv.uga.edu
Subject: PROC SQL--select DISTINCT

Hello SAS-L

I have a SAS data set with many fields (variables).

Suppose I would like to write sql code like

proc sq;
create table FOO2 as
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
VAR4, VAR5
from FOO1
group by VAR1, VAR2, VAR3, VAR4, VAR5;
quit;

SAS does not recognize the distinct function. In other words,
I cannot write

.....
select(distinct VAR1), distinct(VAR2), distinct(VAR3),
.....
quit;

Is there any way to select multiple DISTINCT variables like I am
attempting to do above?

Thank you.
T

--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money

--
Are we headed for a recession? Read more on the Money Portal Mail.com
Money - http://www.mail.com/Money.aspx?cat=money

Richard Read Allen

2008-01-10 22:31:06 UTC

Permalink

Tom,

Here a possible SQL solution.

Richard

==================================================
data work.claim;
input ID:$10. DATE:date9. CLAIM_ID:$10. CODE_ID:$10. FLAG;
format date date9.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0

200279098 02Aug2005 A16620006 92081 0

200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1

200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
;;;;
run;

proc sql;
create table claim1 as
select ID, Date, Claim_ID, max(flag) as sum_flag
from claim
group by ID, Date, Claim_ID;
quit;

proc print;
run;

Post by Tom White
Thank you data _null_. This seems to solve my problem. I just couldn't get anywehere with SQL and DISTINCT.
Also thank you to the the rest of the responders whose ideas I will very likely use very soon.
T
----- Original Message -----
From: "data _null_,"
To: "Tom White"
Subject: Re: PROC SQL--select DISTINCT
Date: Thu, 10 Jan 2008 15:13:50 -0600
I believe this gives the correct result. It uses 2 PROC SUMMARY's
which you may find objectionable.
data work.claim;
input ID:$10. DATE:date9. CLAIM_ID:$10. CODE_ID:$10. FLAG;
format date date9.;
cards;
200279098 01Aug2005 A16620005 92285 0
200279098 01Aug2005 A16620005 99243 0
200279098 02Aug2005 A16620006 92081 0
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 06Sep2005 A16880748 15823 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 67904 1
200279098 01Nov2005 A17313211 92285 0
200279098 01Nov2005 A17313211 92285 0
;;;;
run;
proc summary data=work.claim nway;
class id date claim_id code_id;
output out=work.claim0(drop=_:) idgroup(max(flag) out(flag)=);
run;
*proc print;
run;
proc summary data=work.claim0 nway;
class id date;
output out=work.claim1(drop=_:) sum(flag)=sum_flag;
run;
proc print;
run;