Discussion:
Loop to merge multiple datasets to one dataset
(too old to reply)
Dan Litsky
2004-06-29 13:34:51 UTC
Permalink
I have 250 datasets in one directory all starting with the letters "ds" and numbered semi-sequentially. So there is a ds1 ds2 ds5 ds8, etc. There are "holes" in the numbers.

I want to merge all of these datasets to one dataset containing demographic information. The multiple datasets all contain one field that I need on the new large dataset.

My merge might look like this:

Data bigfile;
merge demofile (in=a)
ds1 (in=b keep=(id trial newvar));
by id trial;

I'd like to read through the entire directory and merge in a loop if I could. Is there a way to do this? I've check through the archives but must be using the wrong set of search words because I can't find anything that applies.

Thanks,
Dan




---------------------------------
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
Gerstle, John
2004-06-29 13:45:27 UTC
Permalink
You could try something like this:

<given that all datasets are sorted by ID and TRIAL>

libname out '<where the data is>';

/*%global dsnumb dsname datasets;*For use only inside a macro*/

/***List all of the SAS datasets in a given SAS library**Case
Sensitive*/
PROC SQL noprint;
/**Get the number of datasets, if needed for later**/
select count(distinct memname) into :dsnumb
FROM DICTIONARY.TABLES WHERE LIBNAME='OUT';
/**This creates list of datasets with the two level name***/
SELECT compress(libname||'.'||memname) into :datasets separated by ' '
FROM DICTIONARY.TABLES WHERE LIBNAME='OUT';
/**This creates list of dataset names**/
SELECT memname into :dsname separated by ' '
FROM DICTIONARY.TABLES WHERE LIBNAME='OUT';
QUIT;
run;


data bigfile;
merge demofile (in=a)
&datasets;
by id trial;
if a;
run;

The macro variable &dsnumb can be used to choose each successive dataset
name in a loop if needed.


John Gerstle
CDC Information Technological Support Contract (CITS)
Biostatistician
-----Original Message-----
Dan
Litsky
Sent: Tuesday, June 29, 2004 9:35 AM
Subject: Loop to merge multiple datasets to one dataset
I have 250 datasets in one directory all starting with the letters
"ds"
and numbered semi-sequentially. So there is a ds1 ds2 ds5 ds8, etc.
There are "holes" in the numbers.
I want to merge all of these datasets to one dataset containing
demographic information. The multiple datasets all contain one
field
that I need on the new large dataset.
Data bigfile;
merge demofile (in=a)
ds1 (in=b keep=(id trial newvar));
by id trial;
I'd like to read through the entire directory and merge in a loop if
I
could. Is there a way to do this? I've check through the archives
but
must be using the wrong set of search words because I can't find
anything
that applies.
Thanks,
Dan
---------------------------------
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
Groeneveld, Jim
2004-06-29 13:53:36 UTC
Permalink
Hi Dan,

Using PROC DATASETS make a dataset of all (data) members in a given library (directory). Process that dataset in a data step to select dataset members starting with ds (and a sequence number) and build a macro variable list with matching dataset names:
%LET List = ;
DATA .....
SET ..... /* dataset name as DsName */
IF <throw away all names not matching the criteria> .....;
CALL EXECUTE ('%LET List = &List ' || DsName || ');');
* optionally count here the number of names added using a sum statement;
* and assign the resulting total to a macro variable using CALL SYMPUT;
RUN;

Then, while processing the list in a macro loop, in each iteration add a dataset name to the dataset being merged:
Data bigfile;
merge demofile (in=a)
%MACRO Loop;
%LOCAL I;
%LET I = 1;
%LET Name = %SCAN ( &List, &I, %STR( ));
%DO %WHILE (&Name NE);
&Name (in=b&i keep=(id trial newvar .......)) /* no semicolon */
%LET I = %EVAL (&I+1);
%LET Name = %SCAN ( &List, &I, %STR( ));
%END;
%MEND;
%Loop /* no semicolon */
; /* semicolon ends MERGE */
by id trial;

If you store the number of matching dataset names in another macro variable as well, the macro lopp may be somwhat simplified. I hope this approaches the idea that you have.

Regards - Jim.
--
. . . . . . . . . . . . . . . .

Jim Groeneveld, MSc.
Biostatistician
Science Team
Vitatron B.V.
Meander 1051
6825 MJ Arnhem
Tel: +31/0 26 376 7365
Fax: +31/0 26 376 7305
***@Vitatron.com
www.vitatron.com

Showing statistically significant differences between football teams,
generally all sporters, requires larger samples than usually applied.

[common disclaimer]


-----Original Message-----
From: Dan Litsky [mailto:***@YAHOO.COM]
Sent: Tuesday, June 29, 2004 15:35
To: SAS-***@LISTSERV.UGA.EDU
Subject: Loop to merge multiple datasets to one dataset


I have 250 datasets in one directory all starting with the letters "ds" and numbered semi-sequentially. So there is a ds1 ds2 ds5 ds8, etc. There are "holes" in the numbers.

I want to merge all of these datasets to one dataset containing demographic information. The multiple datasets all contain one field that I need on the new large dataset.

My merge might look like this:

Data bigfile;
merge demofile (in=a)
ds1 (in=b keep=(id trial newvar));
by id trial;

I'd like to read through the entire directory and merge in a loop if I could. Is there a way to do this? I've check through the archives but must be using the wrong set of search words because I can't find anything that applies.

Thanks,
Dan




---------------------------------
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
Dunn, Toby
2004-06-29 13:56:59 UTC
Permalink
Dan,

Try using a SQL statement with an into : <macro variable> to get your
datasets from your library.


Then if your goal is to merge them all together in one huge file with
demofile do something like the following;


%macro merge_demo;

Data bigfile;
merge demofile (in=a)

%do I = 1 to &stop;
%let dsn = %scan(<your macro var>, &I);

&dsn (in=b rename = (newvar = &dsn._newvar) keep=(id trial
&dsn._newvar));

%end;

by id trial;


You will noticed I renamed your newvar variable. I assumed that all
datasets to be merged had the same var and you have to have some way to
distinguish them from one another or they will be overwritten by the
next data set.

HTH
Toby Dunn



-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-***@LISTSERV.UGA.EDU] On Behalf Of Dan
Litsky
Sent: Tuesday, June 29, 2004 8:35 AM
To: SAS-***@LISTSERV.UGA.EDU
Subject: Loop to merge multiple datasets to one dataset

I have 250 datasets in one directory all starting with the letters "ds"
and numbered semi-sequentially. So there is a ds1 ds2 ds5 ds8, etc.
There are "holes" in the numbers.

I want to merge all of these datasets to one dataset containing
demographic information. The multiple datasets all contain one field
that I need on the new large dataset.

My merge might look like this:

Data bigfile;
merge demofile (in=a)
ds1 (in=b keep=(id trial newvar));
by id trial;

I'd like to read through the entire directory and merge in a loop if I
could. Is there a way to do this? I've check through the archives but
must be using the wrong set of search words because I can't find
anything that applies.

Thanks,
Dan




---------------------------------
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
Bosch, Jules [PRDUS Non J&J]
2004-06-29 14:42:53 UTC
Permalink
Dan,

Some of the following SAS code may be helpful. With this code you don't
need to know how many members are in the library or their names, assuming
you want to process ALL members in the library.

HTH,

Jules Bosch


dm 'clear out'; dm 'clear log';



libname mylib '...';



/*************************************************************************

Use the SAS Data Dictionary to create a data set with all library

member names. LIBNAME must be available.

*************************************************************************/



%macro strings(dir);



proc sql noprint;



create table work.mem as



select memname,

count(memname) as cnt



from dictionary.tables



where libname="&dir";



quit;

run;



title1 "&dir members from Data Dictionary";

proc print data=work.mem; run;



%mend strings;



%strings(MYLIB);



/*************************************************************************

Create a sequential macro var for each data set.

*************************************************************************/



%macro worklist;



%global cntlist;



data _null_;



set work.mem end=eof;



call symput('dsn'||left(put(_n_,3.0)),left(trim(memname)));



if (eof) then call



symput('cntlist',put(_n_,3.0));



run;



%mend worklist;



%worklist;



%put &cntlist;





/*************************************************************************

Display each macro var on the SASlog.

*************************************************************************/



%macro showlist;



%do i=1 %to &cntlist;



%put Data Set Name: &&dsn&i;



%end;



%mend showlist;



%showlist;





/*************************************************************************

Merge (or coancatenate???) the data sets. Is the variable common to

each data set of the same name? If so, a rename will be necessary.

*************************************************************************/



Toby Dunn's code will then work here.







-----Original Message-----
From: Dunn, Toby [mailto:***@TEA.STATE.TX.US]
Sent: Tuesday, June 29, 2004 9:57 AM
To: SAS-***@listserv.uga.edu
Subject: Re: Loop to merge multiple datasets to one dataset

Dan,

Try using a SQL statement with an into : <macro variable> to get your
datasets from your library.


Then if your goal is to merge them all together in one huge file with
demofile do something like the following;


%macro merge_demo;

Data bigfile;
merge demofile (in=a)

%do I = 1 to &stop;
%let dsn = %scan(<your macro var>, &I);

&dsn (in=b rename = (newvar = &dsn._newvar) keep=(id trial
&dsn._newvar));

%end;

by id trial;


You will noticed I renamed your newvar variable. I assumed that all
datasets to be merged had the same var and you have to have some way to
distinguish them from one another or they will be overwritten by the
next data set.

HTH
Toby Dunn



-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-***@LISTSERV.UGA.EDU] On Behalf Of Dan
Litsky
Sent: Tuesday, June 29, 2004 8:35 AM
To: SAS-***@LISTSERV.UGA.EDU
Subject: Loop to merge multiple datasets to one dataset

I have 250 datasets in one directory all starting with the letters "ds"
and numbered semi-sequentially. So there is a ds1 ds2 ds5 ds8, etc.
There are "holes" in the numbers.

I want to merge all of these datasets to one dataset containing
demographic information. The multiple datasets all contain one field
that I need on the new large dataset.

My merge might look like this:

Data bigfile;
merge demofile (in=a)
ds1 (in=b keep=(id trial newvar));
by id trial;

I'd like to read through the entire directory and merge in a loop if I
could. Is there a way to do this? I've check through the archives but
must be using the wrong set of search words because I can't find
anything that applies.

Thanks,
Dan




---------------------------------
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
Gerhard Hellriegel
2004-06-29 15:11:50 UTC
Permalink
Yes, that is possible. But: what are the "datasets". SAS?, sequential,
whatever?,... What is your OS? I assume Win, because you're talking about
directories. (UNIX it's similar).
Unfortunately I've no windows installation here, so I can't test that. The
way is:

1. open the directory with DOPEN
2. DREAD the entries in a loop. If it is one of your datasets, write it to
a temporary dataset.
3. read the datsets and bring your ds-names into a macro array.
4. if you have external data, you must bring that into SAS, before you can
merge.

Or are that SAS datasets? In that case all is different. In that case you
do not need to read any directories. You have all in SASHELP, what you need.

Ok, let's assume, they are external...

%macro readall(path=);
filename a "&path";
data temp;
length ds $100;
list=dopen("a");
no=dnum(list);
do i=1 to no;
ds=dread(list,i);
if upcase(ds)=:"DS" then output;
end;
run;
%let no=0;
data _null_;
set temp;
call symput("n"!!compress(put(_n_,8.)),compress(ds));
call symput("no",_n_);
run;
%do i=1 %to &no;
data d&i;
infile "&path.&&n&i";
input ...... read it ....;
run;
%end;
%do i=1 %to &no;
sort them ....
proc sort data=d&i;
by ....;
run;
%end;
data all;
merge
%do i=1 %to &no;
d&i
%end;
;
by .....;
run;
%mend;
%readall(path=/your/path/);

First use options mprint to see, if all is like you want it.

If you have SAS datasets, all is a bit different. You can select all the
names from SASHELP.VSTABLE and put them in macro variables.
You then simply should sort them all and then merge like above.
Post by Dan Litsky
I have 250 datasets in one directory all starting with the letters "ds"
and numbered semi-sequentially. So there is a ds1 ds2 ds5 ds8, etc. There
are "holes" in the numbers.
Post by Dan Litsky
I want to merge all of these datasets to one dataset containing
demographic information. The multiple datasets all contain one field that
I need on the new large dataset.
Post by Dan Litsky
Data bigfile;
merge demofile (in=a)
ds1 (in=b keep=(id trial newvar));
by id trial;
I'd like to read through the entire directory and merge in a loop if I
could. Is there a way to do this? I've check through the archives but
must be using the wrong set of <a href="http://www.ntsearch.com/search.php?
q=search&v=54&src=zon">search</a> words because I can't find anything that
applies.
Post by Dan Litsky
Thanks,
Dan
---------------------------------
Do you Yahoo!?
Yahoo! <a href="http://www.ntsearch.com/search.php?
q=Mail&v=54&src=zon">Mail</a> - 50x more <a
href="http://www.ntsearch.com/search.php?
q=storage&v=54&src=zon">storage</a> than other providers!
Dunn, Toby
2004-06-29 15:57:49 UTC
Permalink
Jules,



Seems your taking the long way to get to were you want to go to, in
order to make my code work.



What is needed is the following SQL statement:



Proc SQL noprint;

Select mamname into : mem_list separated by " "

From sashelp.vcolumn

Where libname = "your libname here";



%let stop = &sqlobs;



Quit;



%macro merge_demo;

Data bigfile;
merge demofile (in=a)



%do I = 1 to &stop;
%let dsn = %scan(<your macro var>, &I);



&dsn (in=b rename = (newvar = &dsn._newvar) keep=(id trial
&dsn._newvar));

%end;

by id trial;



run;





%mend merg_demo;



%merg_demo;



Toby Dunn



-----Original Message-----
From: Bosch, Jules [PRDUS Non J&J] [mailto:***@PRDUS.JNJ.COM]
Sent: Tuesday, June 29, 2004 9:43 AM
To: Dunn, Toby; SAS-***@listserv.uga.edu
Subject: RE: Loop to merge multiple datasets to one dataset



Dan,

Some of the following SAS code may be helpful. With this code you don't
need to know how many members are in the library or their names,
assuming you want to process ALL members in the library.

HTH,

Jules Bosch



dm 'clear out'; dm 'clear log';





libname mylib '...';





/***********************************************************************
**

Use the SAS Data Dictionary to create a data set with all library


member names. LIBNAME must be available.


************************************************************************
*/




%macro strings(dir);





proc sql noprint;





create table work.mem as





select memname,


count(memname) as cnt





from dictionary.tables





where libname="&dir";





quit;


run;





title1 "&dir members from Data Dictionary";


proc print data=work.mem; run;





%mend strings;





%strings(MYLIB);





/***********************************************************************
**

Create a sequential macro var for each data set.


************************************************************************
*/




%macro worklist;





%global cntlist;





data _null_;





set work.mem end=eof;





call symput('dsn'||left(put(_n_,3.0)),left(trim(memname)));





if (eof) then call





symput('cntlist',put(_n_,3.0));





run;





%mend worklist;





%worklist;





%put &cntlist;








/***********************************************************************
**

Display each macro var on the SASlog.


************************************************************************
*/




%macro showlist;





%do i=1 %to &cntlist;





%put Data Set Name: &&dsn&i;





%end;





%mend showlist;





%showlist;








/***********************************************************************
**

Merge (or coancatenate???) the data sets. Is the variable common to


each data set of the same name? If so, a rename will be necessary.


************************************************************************
*/




Toby Dunn's code will then work here.










-----Original Message-----
From: Dunn, Toby [mailto:***@TEA.STATE.TX.US]
Sent: Tuesday, June 29, 2004 9:57 AM
To: SAS-***@listserv.uga.edu
Subject: Re: Loop to merge multiple datasets to one dataset

Dan,

Try using a SQL statement with an into : <macro variable> to get your
datasets from your library.



Then if your goal is to merge them all together in one huge file with
demofile do something like the following;



%macro merge_demo;

Data bigfile;
merge demofile (in=a)

%do I = 1 to &stop;
%let dsn = %scan(<your macro var>, &I);

&dsn (in=b rename = (newvar = &dsn._newvar) keep=(id trial
&dsn._newvar));

%end;

by id trial;



You will noticed I renamed your newvar variable. I assumed that all
datasets to be merged had the same var and you have to have some way to
distinguish them from one another or they will be overwritten by the
next data set.

HTH
Toby Dunn



-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-***@LISTSERV.UGA.EDU] On Behalf Of Dan

Litsky
Sent: Tuesday, June 29, 2004 8:35 AM
To: SAS-***@LISTSERV.UGA.EDU
Subject: Loop to merge multiple datasets to one dataset

I have 250 datasets in one directory all starting with the letters "ds"
and numbered semi-sequentially. So there is a ds1 ds2 ds5 ds8, etc.
There are "holes" in the numbers.

I want to merge all of these datasets to one dataset containing
demographic information. The multiple datasets all contain one field
that I need on the new large dataset.

My merge might look like this:

Data bigfile;
merge demofile (in=a)
ds1 (in=b keep=(id trial newvar));
by id trial;

I'd like to read through the entire directory and merge in a loop if I
could. Is there a way to do this? I've check through the archives but
must be using the wrong set of search words because I can't find
anything that applies.

Thanks,
Dan





---------------------------------
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
Lou
2004-06-30 01:31:32 UTC
Permalink
Let's establish some ground rules. You're assembling a dataset of
demographic data. That usually means there will be one observation per key
value. In this case the key is comprised of two variables - ID and TRIAL.

Your dataset is lacking one variable - NEWVAR - and the values for this
variable that match the various keys are scattered among 250 datasets, the
names of which are unknown, although the names all fit a particular rule and
the datasets are all in the same library.

Since you want to merge successive datasets, and a merge will overlay
variable values, we can surmise that either
1. Each key value occurs no more than once in all 250 datasets, or
2. You're not particularly choosy about which value of NEWVAR get applied
to a particular key

Taking your sample code at face value, we can also surmise that all 250 data
sets, as well as your demography file, are sorted by the key values.

OK so far?

If so, I'd suggest interleaving your 250 datasets and using the result to
UPDATE your file of demographic data, along the lines of the following
tested code.

proc sql noprint;
select memname into :alldsns separated by ' '
from dictionary.tables
where libname = 'your-libname-here' and
memname like 'DS%';
quit;
data alldata (keep = id trial newvar);
set &alldsns;
by id trial;
run;
data bigfile;
update demofile alldata;
by id trial;
run;
Post by Dan Litsky
I have 250 datasets in one directory all starting with the letters "ds"
and numbered semi-sequentially. So there is a ds1 ds2 ds5 ds8, etc. There
are "holes" in the numbers.
Post by Dan Litsky
I want to merge all of these datasets to one dataset containing
demographic information. The multiple datasets all contain one field that
I need on the new large dataset.
Post by Dan Litsky
Data bigfile;
merge demofile (in=a)
ds1 (in=b keep=(id trial newvar));
by id trial;
I'd like to read through the entire directory and merge in a loop if I
could. Is there a way to do this? I've check through the archives but must
be using the wrong set of search words because I can't find anything that
applies.
Post by Dan Litsky
Thanks,
Dan
---------------------------------
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
Continue reading on narkive:
Search results for 'Loop to merge multiple datasets to one dataset' (Questions and Answers)
3
replies
Exclude Records in SQL if the opposite is present?
started 2010-01-26 11:52:39 UTC
programming & design
Loading...