Clustered Standard Errors

Discussion:

(too old to reply)

w***@gmail.com

2006-06-12 15:20:52 UTC

Dear SAS users,

I have just converted to SAS from Stata. I am using PROC GLM which
absorbs one fixed effected and uses the CLASS command to knock out two
other fixed effects. I want to cluster my standard errors by a
variable, but I could not find the syntax to do this.

My code looks like
proc sort
data = mfn;
by id;
run;
proc glm
data=mfn;
absorb id;
class indt ct;
model lq = indt ct tf/ solution;
run;

I want to cluster by a variable, say X. Does anyone know how I can do
this?

Thanks

Wensui Liu

2006-06-12 15:53:39 UTC

Permalink

you can find it in proc genmod

Post by w***@gmail.com
Dear SAS users,
I have just converted to SAS from Stata. I am using PROC GLM which
absorbs one fixed effected and uses the CLASS command to knock out two
other fixed effects. I want to cluster my standard errors by a
variable, but I could not find the syntax to do this.
My code looks like
proc sort
data = mfn;
by id;
run;
proc glm
data=mfn;
absorb id;
class indt ct;
model lq = indt ct tf/ solution;
run;
I want to cluster by a variable, say X. Does anyone know how I can do
this?
Thanks

--
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

w***@gmail.com

2006-06-13 13:51:13 UTC

Permalink

Thanks, PROC GENMOD has the cluster command. However, I should have
also stated at I am also ABSORBing a variable in my PROC GLM and it
does not appear that GENMOD can handle absorbtion.

Is there a way around this?

Thanks

Post by Wensui Liu
you can find it in proc genmod

--
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

David L Cassell

2006-06-14 06:52:41 UTC

Permalink

It would help a lot if you would explain what you are actually doing.
What are your data, your data sources, and your meta-data? What is
the big picture, what are your hypotheses, and what are you trying to
get out of the analysis?

You mention 'clusters', but you don't explain what you mean. Do you
have survey sample data? If so, PROC GLM is the wrong thing to use.

Are you trying to model the covariance matrix? If so, then PROC
GLM is the wrong thing to use.

Are you going to have leverage points or outliers? If so, then PROC
GLM is still going to be the wrong thing to use.

Are the residuals going to be normally distributed? If so, then once
again PROC GLM is the wrong thing to use.

Now then. Why are you trying to ABSORB the id variable? Why are
you trying to 'cluster by a variable'? The ABSORB statement was really
handy back when there was going to be serious limits on RAM and CPU.
But unless you have some mammoth data sets, you may not even need
it. However, I cannot tell, since you have not supplied SAS-L with
enough information. Yet. :-)

HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330

_________________________________________________________________
On the road to retirement? Check out MSN Life Events for advice on how to
get there! http://lifeevents.msn.com/category.aspx?cid=Retirement

w***@gmail.com

2006-06-14 14:36:04 UTC

Permalink

I have a large dataset with 1 million observations. The model regresses
Y on X plus 3 very large fixed effects. The "ID" fixed effect has
approximately 200,000 levels, which is why I am absorbing it. The other
two fixed effects are smaller, and since SAS cannot absorb more than
one non-nested fixed effect, I add these two fixed effects to the class
statement. Adding dummies to a regression is a standard way to run a
fixed effect regression when the panel is unbalanced and therefore
cannot be de-meaned without following a complicated de-meaning process.

The panel structure of the data is ID by year. Within a region, there
are thousands of IDs associated with that region. I therefore want to
impose that the errors within a region are correlated, but independent
across regions, which is why I want to cluster the standard errors. In
Stata, I would just add the "cluster" command to my regression line,
but because of the large number of observations, I am attempting to
implement this into SAS.

So, I was wondering if it was possible to impose this correlation
structure via the PROC GLM command.

Thanks.

Post by David L Cassell

It would help a lot if you would explain what you are actually doing.
What are your data, your data sources, and your meta-data? What is
the big picture, what are your hypotheses, and what are you trying to
get out of the analysis?
You mention 'clusters', but you don't explain what you mean. Do you
have survey sample data? If so, PROC GLM is the wrong thing to use.
Are you trying to model the covariance matrix? If so, then PROC
GLM is the wrong thing to use.
Are you going to have leverage points or outliers? If so, then PROC
GLM is still going to be the wrong thing to use.
Are the residuals going to be normally distributed? If so, then once
again PROC GLM is the wrong thing to use.
Now then. Why are you trying to ABSORB the id variable? Why are
you trying to 'cluster by a variable'? The ABSORB statement was really
handy back when there was going to be serious limits on RAM and CPU.
But unless you have some mammoth data sets, you may not even need
it. However, I cannot tell, since you have not supplied SAS-L with
enough information. Yet. :-)
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
On the road to retirement? Check out MSN Life Events for advice on how to
get there! http://lifeevents.msn.com/category.aspx?cid=Retirement

Ban Cheah

2006-06-14 19:06:10 UTC

Permalink

There is a claim here:
http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/se_pr
ogramming.htm
that the standard errors from the STATA cluster command can actually be
replicated using PROC SURVEYREG. I found this claim mind boggling and
confusing at the same time. The author claims that he replicated the
STATA standard errors with the cluster option using PROC SURVEYREG even
though he was not working with survey data. He was working with
simulated panel data sets. I have not actually verified this for myself
however.

-----Original Message-----
From: owner-sas-***@listserv.uga.edu [mailto:owner-sas-***@listserv.uga.edu]
On Behalf Of ***@gmail.com
Sent: Wednesday, June 14, 2006 10:36 AM
To: sas-***@uga.edu
Subject: Re: Clustered Standard Errors

I have a large dataset with 1 million observations. The model regresses
Y on X plus 3 very large fixed effects. The "ID" fixed effect has
approximately 200,000 levels, which is why I am absorbing it. The other
two fixed effects are smaller, and since SAS cannot absorb more than one
non-nested fixed effect, I add these two fixed effects to the class
statement. Adding dummies to a regression is a standard way to run a
fixed effect regression when the panel is unbalanced and therefore
cannot be de-meaned without following a complicated de-meaning process.

The panel structure of the data is ID by year. Within a region, there
are thousands of IDs associated with that region. I therefore want to
impose that the errors within a region are correlated, but independent
across regions, which is why I want to cluster the standard errors. In
Stata, I would just add the "cluster" command to my regression line, but
because of the large number of observations, I am attempting to
implement this into SAS.

So, I was wondering if it was possible to impose this correlation
structure via the PROC GLM command.

Thanks.

Post by David L Cassell

Post by w***@gmail.com
Dear SAS users,
I have just converted to SAS from Stata. I am using PROC GLM which
absorbs one fixed effected and uses the CLASS command to knock out
two other fixed effects. I want to cluster my standard errors by a
variable, but I could not find the syntax to do this.
My code looks like
proc sort
data = mfn;
by id;
run;
proc glm
data=mfn;
absorb id;
class indt ct;
model lq = indt ct tf/ solution;
run;
I want to cluster by a variable, say X. Does anyone know how I can do
this?
Thanks

It would help a lot if you would explain what you are actually doing.
What are your data, your data sources, and your meta-data? What is
the big picture, what are your hypotheses, and what are you trying to
get out of the analysis?
You mention 'clusters', but you don't explain what you mean. Do you
have survey sample data? If so, PROC GLM is the wrong thing to use.
Are you trying to model the covariance matrix? If so, then PROC GLM
is the wrong thing to use.
Are you going to have leverage points or outliers? If so, then PROC
GLM is still going to be the wrong thing to use.
Are the residuals going to be normally distributed? If so, then once
again PROC GLM is the wrong thing to use.
Now then. Why are you trying to ABSORB the id variable? Why are you
trying to 'cluster by a variable'? The ABSORB statement was really
handy back when there was going to be serious limits on RAM and CPU.
But unless you have some mammoth data sets, you may not even need it.
However, I cannot tell, since you have not supplied SAS-L with enough
information. Yet. :-)
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
On the road to retirement? Check out MSN Life Events for advice on how
to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement

AnnonymousC

2014-05-31 09:13:32 UTC

Permalink

I think you can:
(1) remove observations with missing variables
(2) demean the independent variables using proc standard
(3) regress the dependent variables on the demeaned independent variables

http://pages.stern.nyu.edu/~adesouza/sasfinphd/index/node60.html
http://pages.stern.nyu.edu/~adesouza/sasfinphd/index/node61.html

The coefficients from the above procedure are exactly the same as those from proc glm (Frisch-Waugh Theorem). But, you do not have to create dummies (which is your main problem). To get robust standard errors, you can simply use proc surveyreg on step(3).

Hope that helps.

o***@gmail.com

2019-04-14 18:28:02 UTC

Permalink

Hi, for those who are repeatedly bumping in this limitation of SAS' PROC GLM, here's a macro that combines FE absorption, multi-way clustering and instrumental variables : http://olivier.godechot.free.fr/hopfichiers/felm.sas

I hope this helps.