Discussion:
assign a unique random integer to each unique id
(too old to reply)
Ai Hua Wang
2010-01-15 16:09:54 UTC
Permalink
Hi,
=A0
I was wondering if anybody in this list could advise how I can assign a uni=
que random integer to each unique id. I have written the following code but=
it does not allow me to get the unique random intergers. The 10000000 is p=
roportional to the size of the data set. Do I miss anything in the codes? P=
lease kindly provide your suggestions.
=A0
Thanks,
Aihua
=A0

data temp;
set datset1;
urand=3Dceil(ranuni(1)*10000000);
run;
=A0=0A=0A=0A _________________________________________________________=
_________=0ABe smarter than spam. See how smart SpamGuard is at giving junk=
email the boot with the All-new Yahoo! Mail. Click on Options in Mail and=
switch to New Mail today or register for free at http://mail.yahoo.ca
Joe Matise
2010-01-15 16:24:18 UTC
Permalink
I believe that will generally work, but the size of your sample determines
if it's likely to work 100%. You won't get 100% guarantee that it will be
unique, because the period of the ranuni function is much larger than that;
so periodically if reduced to a period of only 1e7 it will duplicate. 1e7
should be enough in my experience for a 1e4 order sample size.

A safer way would be to generate a random number (not an integer, just the
ranuni() itself), then sort by it, then use _N_ . That ensures a unique
integer.

-Joe
Post by Ai Hua Wang
Hi,
I was wondering if anybody in this list could advise how I can assign a
unique random integer to each unique id. I have written the following code
but it does not allow me to get the unique random intergers. The 10000000 is
proportional to the size of the data set. Do I miss anything in the codes?
Please kindly provide your suggestions.
Thanks,
Aihua
data temp;
set datset1;
urand=ceil(ranuni(1)*10000000);
run;
__________________________________________________________________
Be smarter than spam. See how smart SpamGuard is at giving junk email the
boot with the All-new Yahoo! Mail. Click on Options in Mail and switch to
New Mail today or register for free at http://mail.yahoo.ca
Muthia Kachirayan
2010-01-15 17:16:56 UTC
Permalink
Aihua,

The use of ranuni() produces non-unique numbers, it is possible to check
whether the present random number has been selected earlier or not - and
when selected - ignore the current and take a new random number. To help
this process use an array as shown below.

data _null_;
if 0 then set sashelp.class nobs = n;
call symputx('n',n);
stop;
run;

data have;
array k[&n] _temporary_;
do _n_ = 1 to &n;
do while(1);
urand = ceil(ranuni(123) * &n);
if k[urand] = . then do;
k[urand] = 1;
output;
leave;
end;
end;
end;
run;

data class;
set have;
set sashelp.class;
run;

proc print data = class;
run;

You may easily modify the 2 data steps into one data step.

With regards,

Muthia Kachirayan
Post by Ai Hua Wang
Hi,
I was wondering if anybody in this list could advise how I can assign a
unique random integer to each unique id. I have written the following code
but it does not allow me to get the unique random intergers. The 10000000 is
proportional to the size of the data set. Do I miss anything in the codes?
Please kindly provide your suggestions.
Thanks,
Aihua
data temp;
set datset1;
urand=ceil(ranuni(1)*10000000);
run;
__________________________________________________________________
Be smarter than spam. See how smart SpamGuard is at giving junk email the
boot with the All-new Yahoo! Mail. Click on Options in Mail and switch to
New Mail today or register for free at http://mail.yahoo.ca
Nordlund, Dan (DSHS/RDA)
2010-01-16 01:11:28 UTC
Permalink
-----Original Message-----
Hua Wang
Sent: Friday, January 15, 2010 8:10 AM
Subject: assign a unique random integer to each unique id
Hi,
I was wondering if anybody in this list could advise how I can assign a unique
random integer to each unique id. I have written the following code but it does not
allow me to get the unique random intergers. The 10000000 is proportional to the
size of the data set. Do I miss anything in the codes? Please kindly provide your
suggestions.
Thanks,
Aihua
data temp;
set datset1;
urand=ceil(ranuni(1)*10000000);
run;
Aihua,

Well, without knowing what you are going to use those "random" numbers for, it is hard to give good advice. Why does your multiplier need to be proportional to dataset size? Why do you want random integers assigned to your data? And why do they need to be unique? If you tell us more about what your actual needs are, we might be able to provide better help.

That being said, the following code will assign unique integers to your data, as long as you have fewer than 2**31 - 1 records.

data want;
if _n_=1 then do;
**----urand will be your random integer----**;
urand=0;
call ranuni(urand,dummy); **get a starting seed;
put "original seed = " urand; **"save" starting seed to log;
retain urand ;
end;

set datset1;
call ranuni(urand,dummy);

drop dummy;
run;

Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
Patrick
2010-01-16 05:30:31 UTC
Permalink
Aihua

I believe the code below does what you want.

It's kind of a hash version of what Muthia does with an array.

It's mainly about creating a hash table which contains all unique
random integers which already have been used in earlier iterations of
the data step.
The code then loops until "urand" contains a new unique value. This
value is in the end of a data step iteration added to the hash table -
and therefore won't be used again later iterations of the data step.

HTH
Patrick


data have;
do var=1 to 20;
output;
end;
run;

data want (drop=rc multiplier);
set have;
retain multiplier 10;
if _n_ =1 then do;
declare hash UniqueRandomInteger(hashexp: 4);
rc = UniqueRandomInteger.defineKey('urand');
rc = UniqueRandomInteger.defineData('urand');
rc = UniqueRandomInteger.defineDone();
end;

do until (UniqueRandomInteger.check() ne 0);
urand=ceil(ranuni(1)*multiplier);
ind=sum(ind,1);
if ind>100 then do;
put "Too small set of possible unique random numbers. Multiplier
will be increased";
put "Multiplier before: " multiplier;
multiplier=multiplier*10;
put "Multiplier increased: " multiplier;
ind=0;
end;
end;

rc = UniqueRandomInteger.add();

run;

proc print data=want;
run;
Ai Hua Wang
2010-01-16 21:30:34 UTC
Permalink
Hi Dan:
=A0
Thank you very much for your thoughtful follow up. Please see my answers be=
low.
=A0
Why does your multiplier need to be proportional to dataset size?=A0
That is just my thought after I tried. Because when I use the smaller multi=
plier I got much more duplicates. When I increase it I got less. Eventually=
I found that it should be at least propotional to the size of the data set=
.
=A0
Why do you want random integers assigned to your data?=A0
I need to use the assigned random integers as the unique id to allow the da=
ta users to identify each unique record. I thought it is better to use the =
integer than the decimal numbers.
=A0
And why do they need to be unique?=A0
See above description and plus:
It is used as the replacement of the sensitive information (unique id)=A0fo=
r the privacy and confidentialiy concern.
=A0
I hope this is helpful when you provide more insightful answers.
=A0
Best Regards,
Aihua
=A0
=A0


--- On Sat, 1/16/10, Nordlund, Dan (DSHS/RDA) <***@dshs.wa.gov> wrote:


From: Nordlund, Dan (DSHS/RDA) <***@dshs.wa.gov>
Subject: RE: assign a unique random integer to each unique id
To: "Ai Hua Wang" <***@YAHOO.COM>, SAS-***@LISTSERV.UGA.EDU
Received: Saturday, January 16, 2010, 1:11 AM
-----Original Message-----
Hua Wang
Sent: Friday, January 15, 2010 8:10 AM
Subject: assign a unique random integer to each unique id
=20
Hi,
=20
I was wondering if anybody in this list could advise how I can assign a u=
nique
random integer to each unique id. I have written the following code but i=
t does not
allow me to get the unique random intergers. The 10000000 is proportional=
to the
size of the data set. Do I miss anything in the codes? Please kindly prov=
ide your
suggestions.
=20
Thanks,
Aihua
=20
=20
data temp;
set datset1;
urand=3Dceil(ranuni(1)*10000000);
run;
=20
=20
Aihua,

Well, without knowing what you are going to use those "random" numbers for,=
it is hard to give good advice.=A0 Why does your multiplier need to be pro=
portional to dataset size?=A0 Why do you want random integers assigned to y=
our data?=A0 And why do they need to be unique?=A0 If you tell us more abou=
t what your actual needs are, we might be able to provide better help.

That being said, the following code will assign unique integers to your dat=
a, as long as you have fewer than 2**31 - 1 records.

data want;=20
=A0 if _n_=3D1 then do;
=A0 =A0 **----urand will be your random integer----**;
=A0 =A0 urand=3D0;
=A0 =A0 call ranuni(urand,dummy); **get a starting seed;
=A0 =A0 put "original seed =3D " urand; **"save" starting seed to log;
=A0 =A0 retain urand ;=20
=A0 end;

=A0 set datset1;
=A0 call ranuni(urand,dummy);

drop dummy;
run;

Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA=A0 98504-5204



=0A=0A=0A ____________________________________________________________=
______=0AYahoo! Canada Toolbar: Search from anywhere on the web, and bookma=
rk your favourite sites. Download it now=0Ahttp://ca.toolbar.yahoo.com.
Gary Klein
2010-01-16 22:24:43 UTC
Permalink
You may also want to try randomly permuting your subjects, then numbering them 1 to your sample size. Then each one will have a integer in random order without duplicates.

For a random permutation, you could either use proc surveysample to do a simple random sample without replacement or proc plan:

data oldids;
do oldid = 1 to 4;
output;
end;
run;

proc plan;
factors oldid=4 random /noprint;
output out=temp data=oldids;
run;
quit;

data newids;
set temp;

newid = _n_;
run;
Subject: Re: assign a unique random integer to each unique id
Date: Saturday, January 16, 2010, 1:30 PM
Thank you very much for your thoughtful follow up. Please
see my answers below.
Why does your multiplier need to be proportional to dataset
size?
That is just my thought after I tried. Because when I use
the smaller multiplier I got much more duplicates. When I
increase it I got less. Eventually I found that it should be
at least propotional to the size of the data set.
Why do you want random integers assigned to your data?
I need to use the assigned random integers as the unique id
to allow the data users to identify each unique record. I
thought it is better to use the integer than the decimal
numbers.
And why do they need to be unique?
It is used as the replacement of the sensitive information
(unique id) for the privacy and confidentialiy concern.
I hope this is helpful when you provide more insightful
answers.
Best Regards,
Aihua
Subject: RE: assign a unique random integer to each unique
id
Received: Saturday, January 16, 2010, 1:11 AM
-----Original Message-----
On Behalf Of Ai
Hua Wang
Sent: Friday, January 15, 2010 8:10 AM
Subject: assign a unique random integer to each unique
id
Hi,
I was wondering if anybody in this list could advise
how I can assign a unique
random integer to each unique id. I have written the
following code but it does not
allow me to get the unique random intergers. The
10000000 is proportional to the
size of the data set. Do I miss anything in the codes?
Please kindly provide your
suggestions.
Thanks,
Aihua
data temp;
set datset1;
urand=ceil(ranuni(1)*10000000);
run;
Aihua,
Well, without knowing what you are going to use those
"random" numbers for, it is hard to give good advice. Why
does your multiplier need to be proportional to dataset
size? Why do you want random integers assigned to your
data? And why do they need to be unique? If you tell us
more about what your actual needs are, we might be able to
provide better help.
That being said, the following code will assign unique
integers to your data, as long as you have fewer than 2**31
- 1 records.
data want;
if _n_=1 then do;
**----urand will be your random integer----**;
urand=0;
call ranuni(urand,dummy); **get a starting seed;
put "original seed = " urand; **"save" starting seed
to log;
retain urand ;
end;
set datset1;
call ranuni(urand,dummy);
drop dummy;
run;
Hope this is helpful,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
__________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and
bookmark your favourite sites. Download it now
http://ca.toolbar.yahoo.com.
Daniel Nordlund
2010-01-17 01:46:10 UTC
Permalink
Aihua,

Given your answers below, my original solution will work just fine. You will get unique integers for your random IDs. However, the IDs will be randomly selected from the range 1 to 2**31 - 1. If you would prefer to have "random" IDs that have a range of 1 to the number of observations in your dataset, then you could do something like:

data _null_;
if 0 then set have nobs=nobs;
call symput('nobs', cats(nobs));
run;

data want;
array _x[*] _x1-_x&nobs;
drop _x1-_x&nobs;
do _n_ = 1 to &nobs;
_x[_n_] = _n_;
end;

seed = 9385247;
drop seed;
call ranperm(seed, of _x:);

do _n_ = 1 to &nobs;
set have;
rand_id = _x[_n_];
output;
end;
run;

This will assign random IDs in one pass through the data. Since you probably know the number of observations in your dataset, you could drop the data _null_ data step and just replace &nobs everywhere with the known number of observations.

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA
-----Original Message-----
Hua Wang
Sent: Saturday, January 16, 2010 1:31 PM
Subject: Re: assign a unique random integer to each unique id
Thank you very much for your thoughtful follow up. Please see my answers
below.
Why does your multiplier need to be proportional to dataset size?
That is just my thought after I tried. Because when I use the smaller
multiplier I got much more duplicates. When I increase it I got less.
Eventually I found that it should be at least propotional to the size of
the data set.
Why do you want random integers assigned to your data?
I need to use the assigned random integers as the unique id to allow the
data users to identify each unique record. I thought it is better to use
the integer than the decimal numbers.
And why do they need to be unique?
It is used as the replacement of the sensitive information (unique id) for
the privacy and confidentialiy concern.
I hope this is helpful when you provide more insightful answers.
Best Regards,
Aihua
Subject: RE: assign a unique random integer to each unique id
Received: Saturday, January 16, 2010, 1:11 AM
-----Original Message-----
Hua Wang
Sent: Friday, January 15, 2010 8:10 AM
Subject: assign a unique random integer to each unique id
Hi,
I was wondering if anybody in this list could advise how I can assign a
unique
random integer to each unique id. I have written the following code but
it does not
allow me to get the unique random intergers. The 10000000 is
proportional to the
size of the data set. Do I miss anything in the codes? Please kindly
provide your
suggestions.
Thanks,
Aihua
data temp;
set datset1;
urand=ceil(ranuni(1)*10000000);
run;
Aihua,
Well, without knowing what you are going to use those "random" numbers
for, it is hard to give good advice. Why does your multiplier need to be
proportional to dataset size? Why do you want random integers assigned to
your data? And why do they need to be unique? If you tell us more about
what your actual needs are, we might be able to provide better help.
That being said, the following code will assign unique integers to your
data, as long as you have fewer than 2**31 - 1 records.
data want;
if _n_=1 then do;
**----urand will be your random integer----**;
urand=0;
call ranuni(urand,dummy); **get a starting seed;
put "original seed = " urand; **"save" starting seed to log;
retain urand ;
end;
set datset1;
call ranuni(urand,dummy);
drop dummy;
run;
Hope this is helpful,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
__________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your
favourite sites. Download it now
http://ca.toolbar.yahoo.com.
Warren Schlechte
2010-01-18 11:14:24 UTC
Permalink
Here's what I would do:

Extract just the ids: proc sql; select unique id, ranuni as random ;from dataset; order by random;
Then using a datastep, do: data dataset; set dataset; if _n_=1 then integer=1; else integer+1;
Then merge this dataset back into the original dataset.

Warren Schlechte

-----Original Message-----
From: Nordlund, Dan (DSHS/RDA) [mailto:***@DSHS.WA.GOV]
Sent: Fri 1/15/2010 7:11 PM
Subject: Re: assign a unique random integer to each unique id
-----Original Message-----
Hua Wang
Sent: Friday, January 15, 2010 8:10 AM
Subject: assign a unique random integer to each unique id
Hi,
I was wondering if anybody in this list could advise how I can assign a unique
random integer to each unique id. I have written the following code but it does not
allow me to get the unique random intergers. The 10000000 is proportional to the
size of the data set. Do I miss anything in the codes? Please kindly provide your
suggestions.
Thanks,
Aihua
data temp;
set datset1;
urand=ceil(ranuni(1)*10000000);
run;
Aihua,

Well, without knowing what you are going to use those "random" numbers for, it is hard to give good advice. Why does your multiplier need to be proportional to dataset size? Why do you want random integers assigned to your data? And why do they need to be unique? If you tell us more about what your actual needs are, we might be able to provide better help.

That being said, the following code will assign unique integers to your data, as long as you have fewer than 2**31 - 1 records.

data want;
if _n_=1 then do;
**----urand will be your random integer----**;
urand=0;
call ranuni(urand,dummy); **get a starting seed;
put "original seed = " urand; **"save" starting seed to log;
retain urand ;
end;

set datset1;
call ranuni(urand,dummy);

drop dummy;
run;

Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
Muthia Kachirayan
2010-01-18 16:59:49 UTC
Permalink
Aihua,

Your clarification to Dan makes your need understood. My earlier array
solution permutes the observation numbers in random order to get the
urands. There is some chance that both of them may be same. The urands will
suggest the similarity between observation numbers and urands. This can be
rectified by choosing a multiplier to the number of observations to get a
random sequence of urands.

The test data set is:

data have;
do key = 1 to 10;
sat = put(key * 100 + key, z4.);
output;
end;
run;

The data set, HAVE, has KEY as the primary key in ascending order with some
SAT data.

key sat
1 0101
2 0202
3 0303
4 0404
5 0505
6 0606
7 0707
8 0808
9 0909
10 1010

The following program creates new UNIQUE IDs(uid) based on the number of
observations in the data set and note that the KEY is not used in the
process. Let us use a number, NUM_FOLD, to multiply the observation number,
say 1000.

%let num_fold = 1000;

data need;
if _n_ = 1 then do;
declare hash h(hashexp:16);
h.definekey('uid');
h.definedata('RID','key','uid');
h.definedone();
end;
do RID = 1 to num;
set have nobs = num ;
uid = ceil(ranuni(123) * num * &num_fold);
do rc = h.check() by 0 while (rc = 0);
uid = ceil(ranuni(123) * num * &num_fold); ** Try another random
number ;
rc = h.check();
end;
h.add();
output;
end;
h.output(dataset:'LOOKUP');
stop;
drop rc key;
run;

proc print data = need;
run;

The data set, NEED, gives the UID for the corresponding Record ID(RID) and
the KEY is dropped to keep the secrecy of data set.

RID sat uid
1 0101 7504
2 0202 3210
3 0303 1784
4 0404 9061
5 0505 3572
6 0606 2212
7 0707 7865
8 0808 3981
9 0909 1247
10 1010 1877

My earlier array solution is a special case when NUM_FOLD = 1.

This program also gives another data set, LOOKUP, giving links to RID/KEY.
When it is sorted by UID, the reverse process of getting RID/KEY becomes
easy.

There is another possiblity of creating UIDs based on the KEYs and not based
on the observation number. The following program does it based on MOD()
function. However, KEYs, have to nemeric but this restriction can be removed
in some circumstances when the character-type KEYs can be changed to numeric
by the use of functions like, PIBw. For further details refer to Dorfman(Key
indexing, Bitmapping and Hashing).

data need;
if _n_ = 1 then do;
declare hash h(hashexp:16);
h.definekey('uid');
h.definedata('RID','key','uid');
h.definedone();
end;
do RID = 1 to num;
set have nobs = num ;
uid = mod(key, num) + 1;
do rc = h.check() by 0 while (rc = 0);
** Try another random number ;
uid = uid + 1;
if uid > num then uid = 1;
rc = h.check();
end;
h.add();
output;
end;
h.output(dataset:'LOOKUP');
stop;
drop rc key;
run;

Do you find this program useful to solve your issue ? Your feedback will be
useful to SAS-Lers to give alternate solutions.

Kind regards,
Muthia Kachirayan
Post by Ai Hua Wang
Thank you very much for your thoughtful follow up. Please see my answers
below.
Why does your multiplier need to be proportional to dataset size?
That is just my thought after I tried. Because when I use the smaller
multiplier I got much more duplicates. When I increase it I got less.
Eventually I found that it should be at least propotional to the size of the
data set.
Why do you want random integers assigned to your data?
I need to use the assigned random integers as the unique id to allow the
data users to identify each unique record. I thought it is better to use the
integer than the decimal numbers.
And why do they need to be unique?
It is used as the replacement of the sensitive information (unique id) for
the privacy and confidentialiy concern.
I hope this is helpful when you provide more insightful answers.
Best Regards,
Aihua
Mike Zdeb
2010-01-18 19:07:28 UTC
Permalink
hi ... nice

also, same answer, slight tweak of code in the hash routine ...

data have;
do key = 1 to 10;
sat = put(key * 100 + key, z4.);
output;
end;
run;

%let m=1000;

data want;
declare hash rnd ();
rnd.definekey ('uid');
rnd.definedone ();
do until (done);
set have end=done nobs=n;
do until (not rc);
uid = ceil(ranuni(123) * n * &m);
rc=rnd.add();
end;
output;
end;
stop;
drop key rc;
run;

proc print data=want;
run;

Obs sat uid
1 0101 7504
2 0202 3210
3 0303 1784
4 0404 9061
5 0505 3572
6 0606 2212
7 0707 7865
8 0808 3981
9 0909 1247
10 1010 1877


--
Mike Zdeb
***@Albany School of Public Health
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
Post by Muthia Kachirayan
Aihua,
Your clarification to Dan makes your need understood. My earlier array
solution permutes the observation numbers in random order to get the
urands. There is some chance that both of them may be same. The urands will
suggest the similarity between observation numbers and urands. This can be
rectified by choosing a multiplier to the number of observations to get a
random sequence of urands.
data have;
do key = 1 to 10;
sat = put(key * 100 + key, z4.);
output;
end;
run;
The data set, HAVE, has KEY as the primary key in ascending order with some
SAT data.
key sat
1 0101
2 0202
3 0303
4 0404
5 0505
6 0606
7 0707
8 0808
9 0909
10 1010
The following program creates new UNIQUE IDs(uid) based on the number of
observations in the data set and note that the KEY is not used in the
process. Let us use a number, NUM_FOLD, to multiply the observation number,
say 1000.
%let num_fold = 1000;
data need;
if _n_ = 1 then do;
declare hash h(hashexp:16);
h.definekey('uid');
h.definedata('RID','key','uid');
h.definedone();
end;
do RID = 1 to num;
set have nobs = num ;
uid = ceil(ranuni(123) * num * &num_fold);
do rc = h.check() by 0 while (rc = 0);
uid = ceil(ranuni(123) * num * &num_fold); ** Try another random
number ;
rc = h.check();
end;
h.add();
output;
end;
h.output(dataset:'LOOKUP');
stop;
drop rc key;
run;
proc print data = need;
run;
The data set, NEED, gives the UID for the corresponding Record ID(RID) and
the KEY is dropped to keep the secrecy of data set.
RID sat uid
1 0101 7504
2 0202 3210
3 0303 1784
4 0404 9061
5 0505 3572
6 0606 2212
7 0707 7865
8 0808 3981
9 0909 1247
10 1010 1877
My earlier array solution is a special case when NUM_FOLD = 1.
This program also gives another data set, LOOKUP, giving links to RID/KEY.
When it is sorted by UID, the reverse process of getting RID/KEY becomes
easy.
There is another possiblity of creating UIDs based on the KEYs and not based
on the observation number. The following program does it based on MOD()
function. However, KEYs, have to nemeric but this restriction can be removed
in some circumstances when the character-type KEYs can be changed to numeric
by the use of functions like, PIBw. For further details refer to Dorfman(Key
indexing, Bitmapping and Hashing).
data need;
if _n_ = 1 then do;
declare hash h(hashexp:16);
h.definekey('uid');
h.definedata('RID','key','uid');
h.definedone();
end;
do RID = 1 to num;
set have nobs = num ;
uid = mod(key, num) + 1;
do rc = h.check() by 0 while (rc = 0);
** Try another random number ;
uid = uid + 1;
if uid > num then uid = 1;
rc = h.check();
end;
h.add();
output;
end;
h.output(dataset:'LOOKUP');
stop;
drop rc key;
run;
Do you find this program useful to solve your issue ? Your feedback will be
useful to SAS-Lers to give alternate solutions.
Kind regards,
Muthia Kachirayan
Post by Ai Hua Wang
Thank you very much for your thoughtful follow up. Please see my answers
below.
Why does your multiplier need to be proportional to dataset size?
That is just my thought after I tried. Because when I use the smaller
multiplier I got much more duplicates. When I increase it I got less.
Eventually I found that it should be at least propotional to the size of the
data set.
Why do you want random integers assigned to your data?
I need to use the assigned random integers as the unique id to allow the
data users to identify each unique record. I thought it is better to use the
integer than the decimal numbers.
And why do they need to be unique?
It is used as the replacement of the sensitive information (unique id) for
the privacy and confidentialiy concern.
I hope this is helpful when you provide more insightful answers.
Best Regards,
Aihua
Muthia Kachirayan
2010-01-18 19:37:51 UTC
Permalink
Mike,

It is a good tweak !

Thanks.
Post by Mike Zdeb
hi ... nice
also, same answer, slight tweak of code in the hash routine ...
data have;
do key = 1 to 10;
sat = put(key * 100 + key, z4.);
output;
end;
run;
%let m=1000;
data want;
declare hash rnd ();
rnd.definekey ('uid');
rnd.definedone ();
do until (done);
set have end=done nobs=n;
do until (not rc);
uid = ceil(ranuni(123) * n * &m);
rc=rnd.add();
end;
output;
end;
stop;
drop key rc;
run;
proc print data=want;
run;
Obs sat uid
1 0101 7504
2 0202 3210
3 0303 1784
4 0404 9061
5 0505 3572
6 0606 2212
7 0707 7865
8 0808 3981
9 0909 1247
10 1010 1877
--
Mike Zdeb
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
Post by Muthia Kachirayan
Aihua,
Your clarification to Dan makes your need understood. My earlier array
solution permutes the observation numbers in random order to get the
urands. There is some chance that both of them may be same. The urands
will
Post by Muthia Kachirayan
suggest the similarity between observation numbers and urands. This can
be
Post by Muthia Kachirayan
rectified by choosing a multiplier to the number of observations to get a
random sequence of urands.
data have;
do key = 1 to 10;
sat = put(key * 100 + key, z4.);
output;
end;
run;
The data set, HAVE, has KEY as the primary key in ascending order with
some
Post by Muthia Kachirayan
SAT data.
key sat
1 0101
2 0202
3 0303
4 0404
5 0505
6 0606
7 0707
8 0808
9 0909
10 1010
The following program creates new UNIQUE IDs(uid) based on the number of
observations in the data set and note that the KEY is not used in the
process. Let us use a number, NUM_FOLD, to multiply the observation
number,
Post by Muthia Kachirayan
say 1000.
%let num_fold = 1000;
data need;
if _n_ = 1 then do;
declare hash h(hashexp:16);
h.definekey('uid');
h.definedata('RID','key','uid');
h.definedone();
end;
do RID = 1 to num;
set have nobs = num ;
uid = ceil(ranuni(123) * num * &num_fold);
do rc = h.check() by 0 while (rc = 0);
uid = ceil(ranuni(123) * num * &num_fold); ** Try another random
number ;
rc = h.check();
end;
h.add();
output;
end;
h.output(dataset:'LOOKUP');
stop;
drop rc key;
run;
proc print data = need;
run;
The data set, NEED, gives the UID for the corresponding Record ID(RID)
and
Post by Muthia Kachirayan
the KEY is dropped to keep the secrecy of data set.
RID sat uid
1 0101 7504
2 0202 3210
3 0303 1784
4 0404 9061
5 0505 3572
6 0606 2212
7 0707 7865
8 0808 3981
9 0909 1247
10 1010 1877
My earlier array solution is a special case when NUM_FOLD = 1.
This program also gives another data set, LOOKUP, giving links to
RID/KEY.
Post by Muthia Kachirayan
When it is sorted by UID, the reverse process of getting RID/KEY becomes
easy.
There is another possiblity of creating UIDs based on the KEYs and not
based
Post by Muthia Kachirayan
on the observation number. The following program does it based on MOD()
function. However, KEYs, have to nemeric but this restriction can be
removed
Post by Muthia Kachirayan
in some circumstances when the character-type KEYs can be changed to
numeric
Post by Muthia Kachirayan
by the use of functions like, PIBw. For further details refer to
Dorfman(Key
Post by Muthia Kachirayan
indexing, Bitmapping and Hashing).
data need;
if _n_ = 1 then do;
declare hash h(hashexp:16);
h.definekey('uid');
h.definedata('RID','key','uid');
h.definedone();
end;
do RID = 1 to num;
set have nobs = num ;
uid = mod(key, num) + 1;
do rc = h.check() by 0 while (rc = 0);
** Try another random number ;
uid = uid + 1;
if uid > num then uid = 1;
rc = h.check();
end;
h.add();
output;
end;
h.output(dataset:'LOOKUP');
stop;
drop rc key;
run;
Do you find this program useful to solve your issue ? Your feedback will
be
Post by Muthia Kachirayan
useful to SAS-Lers to give alternate solutions.
Kind regards,
Muthia Kachirayan
Post by Ai Hua Wang
Thank you very much for your thoughtful follow up. Please see my answers
below.
Why does your multiplier need to be proportional to dataset size?
That is just my thought after I tried. Because when I use the smaller
multiplier I got much more duplicates. When I increase it I got less.
Eventually I found that it should be at least propotional to the size of
the
Post by Muthia Kachirayan
Post by Ai Hua Wang
data set.
Why do you want random integers assigned to your data?
I need to use the assigned random integers as the unique id to allow the
data users to identify each unique record. I thought it is better to use
the
Post by Muthia Kachirayan
Post by Ai Hua Wang
integer than the decimal numbers.
And why do they need to be unique?
It is used as the replacement of the sensitive information (unique id)
for
Post by Muthia Kachirayan
Post by Ai Hua Wang
the privacy and confidentialiy concern.
I hope this is helpful when you provide more insightful answers.
Best Regards,
Aihua
Loading...