There is a claim here:
that the standard errors from the STATA cluster command can actually be
replicated using PROC SURVEYREG. I found this claim mind boggling and
confusing at the same time. The author claims that he replicated the
STATA standard errors with the cluster option using PROC SURVEYREG even
though he was not working with survey data. He was working with
simulated panel data sets. I have not actually verified this for myself
From: firstname.lastname@example.org [mailto:email@example.com]
On Behalf Of ***@gmail.com
Sent: Wednesday, June 14, 2006 10:36 AM
Subject: Re: Clustered Standard Errors
I have a large dataset with 1 million observations. The model regresses
Y on X plus 3 very large fixed effects. The "ID" fixed effect has
approximately 200,000 levels, which is why I am absorbing it. The other
two fixed effects are smaller, and since SAS cannot absorb more than one
non-nested fixed effect, I add these two fixed effects to the class
statement. Adding dummies to a regression is a standard way to run a
fixed effect regression when the panel is unbalanced and therefore
cannot be de-meaned without following a complicated de-meaning process.
The panel structure of the data is ID by year. Within a region, there
are thousands of IDs associated with that region. I therefore want to
impose that the errors within a region are correlated, but independent
across regions, which is why I want to cluster the standard errors. In
Stata, I would just add the "cluster" command to my regression line, but
because of the large number of observations, I am attempting to
implement this into SAS.
So, I was wondering if it was possible to impose this correlation
structure via the PROC GLM command.
Post by David L Cassell
Post by firstname.lastname@example.org
Dear SAS users,
I have just converted to SAS from Stata. I am using PROC GLM which
absorbs one fixed effected and uses the CLASS command to knock out
two other fixed effects. I want to cluster my standard errors by a
variable, but I could not find the syntax to do this.
My code looks like
data = mfn;
class indt ct;
model lq = indt ct tf/ solution;
I want to cluster by a variable, say X. Does anyone know how I can do
It would help a lot if you would explain what you are actually doing.
What are your data, your data sources, and your meta-data? What is
the big picture, what are your hypotheses, and what are you trying to
get out of the analysis?
You mention 'clusters', but you don't explain what you mean. Do you
have survey sample data? If so, PROC GLM is the wrong thing to use.
Are you trying to model the covariance matrix? If so, then PROC GLM
is the wrong thing to use.
Are you going to have leverage points or outliers? If so, then PROC
GLM is still going to be the wrong thing to use.
Are the residuals going to be normally distributed? If so, then once
again PROC GLM is the wrong thing to use.
Now then. Why are you trying to ABSORB the id variable? Why are you
trying to 'cluster by a variable'? The ABSORB statement was really
handy back when there was going to be serious limits on RAM and CPU.
But unless you have some mammoth data sets, you may not even need it.
However, I cannot tell, since you have not supplied SAS-L with enough
information. Yet. :-)
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
On the road to retirement? Check out MSN Life Events for advice on how
to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement