use glimmix to test if the variance is significant for random

Discussion:

(too old to reply)

Peter Flom

2009-03-09 18:34:53 UTC

Dear all,
I'm running a logistic regression model with random effect. The dependent
vaiable is binary. I have around 4000 cases with late_hiv_testing and
9000 are not. Due to the spatial clustering pattern of the cases(moran's I
significant), I include zipcode as a random effect in the logistic model
and use proc glimmix procedure.
My questions is how can I tell if the random effect is significant? from
Covariance parameter estimates?

My question, back to you, is why you care whether the random effect is significant.

Suppose it is significant .... what would you do?
Suppose it is not significant ... would you do something different?

One answer is that you might pool the data if the random effect were not sig. To me,
this is not good reasoning. At least in most cases, we should pool the data
(or not pool it) for other reasons:

1) Is it reasonable to pool?
2) Does pooling change estimates of fixed effects and their variances?

So, might the relationship between your DV and IV depend on geography? I don't know.
Perhaps you know, perhaps not.

But you can assess 2) by running both models.

The other problem is that the variance (and hence, the sig.) of the random effects
can be hard to estimate. Proper df is a contentious area. In longitudinal data, one book
(I think it's Hedeker and Gibbons, but can't swear to it) suggests leaving estimates of the sig.
of random effects out of any reporting.

Peter

Peter L. Flom, PhD
Statistical Consultant
www DOT peterflomconsulting DOT com

Jerry Davis

2009-03-09 19:11:04 UTC

Permalink

My questions is how can I tell if the random effect is significant? from
Covariance parameter estimates?
proc glimmix data=a;
model late_hiv_testing(event="1")=/link=logit dist=binary ddfm=bw solution;
random intercept/subject=zipcode;
run;

If you have SAS 9.2 you can use the COVTEST statement in GLIMMIX.

proc glimmix data=a;
model late_hiv_testing(event="1")=/link=logit dist=binary ddfm=bw
solution;
random intercept/subject=zipcode;
covtest 'Ho: No random effects' ZeroG; * tests if G matrix can be
reduced to a 0 matrix;
run;

As an aside, I might be tempted to put zipcode in a class statement.
But I don't know enough about your data to say one way or the other.

Jerry
UGA

Marianne Kang

2009-03-09 20:50:39 UTC

Permalink

Hi Peter,
Thanks for your replying.
1. I think the reason I want to see if the random effect is significant
is: should I conclude zipcode as a random varialbe or fixed variable. If
the random effect is not significant, why should we bother to have it
random?

2. I performed the moran's I test and hot-spot analysis, both showed the
distribution of event (late hiv testing cases) has clustering pattern.
That is the reason why I include zipcode as a random variable.

3. If I run model with zipcode as random variable in proc glimmix, then
run again with zipcode as a fixed variable, and the beta coefficients for
fixed variables in both model changed a lot, let's say, more than 20%, can
I say the random effect should be included in the model?

Marianne

Peter Flom

2009-03-09 21:11:02 UTC

Permalink

One reason would be because it affects the parameters of the other fixed effects.

To my mind, random vs. fixed should not be based on results of this sort of thing, but on
what you are trying to do. Are you interested in these particular ZIP codes? Or did you just
select these ZIP codes at random from the USA? In the former case, use fixed, in the latter, random.

Post by Marianne Kang
2. I performed the moran's I test and hot-spot analysis, both showed the
distribution of event (late hiv testing cases) has clustering pattern.
That is the reason why I include zipcode as a random variable.

That's a very good reason!

Why reject that reason for a not-so-good one (i.e., it wasn't significant)?

Post by Marianne Kang
3. If I run model with zipcode as random variable in proc glimmix, then
run again with zipcode as a fixed variable, and the beta coefficients for
fixed variables in both model changed a lot, let's say, more than 20%, can
I say the random effect should be included in the model?

See above.
I would say you want to compare either A) ZIP as random vs. ZIP ignored or
B) ZIP as fixed vs. ZIP ignored.

in A) you are trying to see if the clustering effect makes a difference in the fixed effects. Your eye already told you "Yes" but it can't hurt to see how much difference.

in B) ZIP is being used like any other variable - maybe it just doesn't matter whether you include it, so you should favor a more parsimonious model

but A) vs. B) is a substantive and design question, not a statistical one.

HTH

Peter

Peter L. Flom, PhD
Statistical Consultant
www DOT peterflomconsulting DOT com

Swank, Paul R

2009-03-09 22:40:32 UTC

Permalink

The data for the test is already there (0.07097 / 0.01402) which would say it is significant but I don't know if I would trust it. Even in Mixed, it is not considered particularly reliable. I agree with CAM, does it make a difference in the solution if you do it different ways?

Paul

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-***@LISTSERV.UGA.EDU] On Behalf Of Jerry Davis
Sent: Monday, March 09, 2009 2:11 PM
To: SAS-***@LISTSERV.UGA.EDU
Subject: Re: use glimmix to test if the variance is significant for random variable

Jerry Davis

2009-03-10 13:59:57 UTC

Permalink

Post by Swank, Paul R
The data for the test is already there (0.07097 / 0.01402) which
would say it is significant but I don't know if I would trust it. Even
in Mixed, it is not considered particularly reliable. I agree with CAM,
does it make a difference in the solution if you do it different ways?

The test you describe is not the test performed by the COVTEST option.
It fits the full and reduced (alternative) models to the data. The
Chisq test statistic is the difference between the -2 log likelihoods
for the models.

Jerry

sudip chatterjee

2009-03-10 15:02:20 UTC

Permalink

In Generalized mixed model there is a method to calculate ICC (
written by Snijder & Bosker) you can use that and check if it greater
than zero or not ?

The way you explained your problem it seems it is always a good idea
to keep ZIP as random.

If you have already test the Moran's I then while doing the proc
glimmix in random statement you can use type = lin() and ldata =
option,

Where your ldata is your weight matrix ( neighbours get 1 & others 0)
which you had used for Moran's I

Another check is : after you run if you get a warning message that
your model is not positive definite then I think you can consider for
fixed rather than random.

Regards

Post by Marianne Kang
Hi Peter,
Thanks for your replying.
1. I think the reason I want to see if the random effect is significant
is: should I conclude zipcode as a random varialbe or fixed variable. If
the random effect is not significant, why should we bother to have it
random?
2. I performed the moran's I test and hot-spot analysis, both showed the
distribution of event (late hiv testing cases) has clustering pattern.
That is the reason why I include zipcode as a random variable.
3. If I run model with zipcode as random variable in proc glimmix, then
run again with zipcode as a fixed variable, and the beta coefficients for
fixed variables in both model changed a lot, let's say, more than 20%, can
I say the random effect should be included in the model?
Marianne