There is a claim here:

http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/se_programming.htm

ogramming.htm

that the standard errors from the STATA cluster command can actually be

replicated using PROC SURVEYREG. I found this claim mind boggling and

confusing at the same time. The author claims that he replicated the

STATA standard errors with the cluster option using PROC SURVEYREG even

though he was not working with survey data. He was working with

simulated panel data sets. I have not actually verified this for myself

however.

I have a large dataset with 1 million observations. The model regresses

Y on X plus 3 very large fixed effects. The "ID" fixed effect has

approximately 200,000 levels, which is why I am absorbing it. The other

two fixed effects are smaller, and since SAS cannot absorb more than one

non-nested fixed effect, I add these two fixed effects to the class

statement. Adding dummies to a regression is a standard way to run a

fixed effect regression when the panel is unbalanced and therefore

cannot be de-meaned without following a complicated de-meaning process.

The panel structure of the data is ID by year. Within a region, there

are thousands of IDs associated with that region. I therefore want to

impose that the errors within a region are correlated, but independent

across regions, which is why I want to cluster the standard errors. In

Stata, I would just add the "cluster" command to my regression line, but

because of the large number of observations, I am attempting to

implement this into SAS.

So, I was wondering if it was possible to impose this correlation

structure via the PROC GLM command.

Thanks.

*Post by David L Cassell**Post by w***@gmail.com*Dear SAS users,

I have just converted to SAS from Stata. I am using PROC GLM which

absorbs one fixed effected and uses the CLASS command to knock out

two other fixed effects. I want to cluster my standard errors by a

variable, but I could not find the syntax to do this.

My code looks like

proc sort

data = mfn;

by id;

run;

proc glm

data=mfn;

absorb id;

class indt ct;

model lq = indt ct tf/ solution;

run;

I want to cluster by a variable, say X. Does anyone know how I can do

this?

Thanks

It would help a lot if you would explain what you are actually doing.

What are your data, your data sources, and your meta-data? What is

the big picture, what are your hypotheses, and what are you trying to

get out of the analysis?

You mention 'clusters', but you don't explain what you mean. Do you

have survey sample data? If so, PROC GLM is the wrong thing to use.

Are you trying to model the covariance matrix? If so, then PROC GLM

is the wrong thing to use.

Are you going to have leverage points or outliers? If so, then PROC

GLM is still going to be the wrong thing to use.

Are the residuals going to be normally distributed? If so, then once

again PROC GLM is the wrong thing to use.

Now then. Why are you trying to ABSORB the id variable? Why are you

trying to 'cluster by a variable'? The ABSORB statement was really

handy back when there was going to be serious limits on RAM and CPU.

But unless you have some mammoth data sets, you may not even need it.

However, I cannot tell, since you have not supplied SAS-L with enough

information. Yet. :-)

