Discussion:
Another PROC MIXED question - marginal and conditional
Dale McLerran
2006-01-24 19:49:35 UTC
Hello
With ODS graphics in PROC MIXED, SAS can produce studentized,
marginal, and raw residuals, each can be conditional or marginal.
How do these relate to the assumptions of the model?
y = Xbeta + Zgamma + epsilon
E(gamma) = E(epsilon) = 0
V(gamma) = G
V(epsilon) = R
I understand that studentization and Pearsonization (if that's the
word) are ways to standardize the raw numbers;
my question is more about the conditional vs. the marginal. I see
that (on p. 2764 in the SAS STAT manuals)
r marginal _i = Y_i -x'_i*betahat
r conditional_i = r_mi - z'_i*gammahat
this seems to me to suggest that the marginal residuals are somehow
about G, and the conditional residuals about R.....but I am not at
all sure.....
Thanks as always
Peter
Peter L. Flom, PhD
Peter,

The conditional residuals are obtained as

Rc_i = Y_i - E(Y|x,z)
= Y_i - (x'_i*betahat + z'_i*gammahat)

The marginal residuals are obtained as

Rm_i = Y_i - E(Y|x)
= Y_i - x'_i*betahat

Suppose that you have a new cluster/subject that was not
part of your estimation model. Thus, you have no estimate
of the random effects which pertain to this new subject.
You cannot compute z'_i*gammahat, so you cannot compute
subject-specific (conditional) residuals. You can compute
the marginal residuals (assuming that there are no missing
values in the vector x_i).

The marginal residuals, then, are residuals that have a
distribution which is quite nearly the population distribution
of residuals for your fixed-effect model. If you went out
into the world armed with your mixed model, you would only
be able to apply the fixed effect portion of the model.
The marginal residuals represent just how discrepent the
fixed effect model would be over the population of subjects
in the absence of knowledge of subject-specific effects.
(Well, this is not quite true, because the residuals are
obtained so as to achieve best fit in the observed data.)

Note that the marginal residuals do not represent how
discrepent the fixed effect model would be for a single
subject with unspecified subject-specific effects. The
residuals for a single subject with unspecified subject-
specific effects will tend toward all positive or all
negative due to the exclusion of the subject-specific
effects. Thus, the marginal residuals will be biased
for a given subject, with bias due to the subject-specific
effects which are excluded from the marginal model.

The conditional residuals provide an indication of how well
your model estimates the response WHEN YOU KNOW SUBJECT-
SPECIFIC EFFECTS. The conditional residuals may be better
than the marginal residuals for model diagnostic purposes -
evaluating linearity and heteroskedasticity of the response
as a function of the variables included in your model.
That is, the conditional residuals may be useful for
constructing your model. However, the conditonal residuals
will not provide a clear picture of how well the model
works in the population.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: ***@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
David L Cassell
2006-01-25 22:35:26 UTC
With ODS graphics in PROC MIXED, SAS can produce studentized, marginal, and
raw residuals, each can be conditional or marginal.
How do these relate to the assumptions of the model?
y = Xbeta + Zgamma + epsilon
E(gamma) = E(epsilon) = 0
V(gamma) = G
V(epsilon) = R
I understand that studentization and Pearsonization (if that's the word)
are ways to standardize the raw numbers;
my question is more about the conditional vs. the marginal. I see that
(on p. 2764 in the SAS STAT manuals)
r marginal _i = Y_i -x'_i*betahat
r conditional_i = r_mi - z'_i*gammahat
this seems to me to suggest that the marginal residuals are somehow about
G, and the conditional residuals about R.....but I am not at all sure.....
I see that Dale has already given his usual impressive answer. Let me just
toss
some trafe in.

Think about marginal vs. conditional in the same way you think about
'marginal'
when using PROC FREQ. It's *analogous* to an average of the possible
conditional
means - but it is not an unbiased estimator, because of the z'_i*gammahat
part. So 'conditional' goes with 'conditioning on the subject'.

You can convince yourself that the two words actually make sense, if you try
hard
enough. :-)

"I dare say you haven't had much practice," said the queen. "When I was your
age, I always did it for half an hour a day. Why, sometimes I've believed as
many
as six impossible things before breakfast." - Lewis Carroll

David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330

_________________________________________________________________
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Peter Flom
2006-01-26 11:41:50 UTC
Let me just say once again how helpful this list is.

some jokes.

I gotta get me to a SUGI to meet more of you

Peter
2006-01-26 12:16:50 UTC
Thanks to JW,PF and DC for their comments.

When I have done the frequency table for the IV's as suggested by DC (see below), I find few values for the combinations, I found frequencies like 2, 3 and many of one.

I think, that I need to reduce the number of IV's and I must have more than 5 number for each combination in order to have a raisonable accurate model.

Any this case which goodness of fit can I use in order to see the model fit?

I have another querry related to logistic regression:

It is important to declare binary and categorical IV as class either in a binary logistic or ordored logistic?
I have noticed that in NESUG18 (Peter Flom) the categorcial variables are not declared as class.

With ODS graphics in PROC MIXED, SAS can produce studentized, marginal, and
raw residuals, each can be conditional or marginal.
How do these relate to the assumptions of the model?
y = Xbeta + Zgamma + epsilon
E(gamma) = E(epsilon) = 0
V(gamma) = G
V(epsilon) = R
I understand that studentization and Pearsonization (if that's the word)
are ways to standardize the raw numbers;
my question is more about the conditional vs. the marginal. I see that
(on p. 2764 in the SAS STAT manuals)
r marginal _i = Y_i -x'_i*betahat
r conditional_i = r_mi - z'_i*gammahat
this seems to me to suggest that the marginal residuals are somehow about
G, and the conditional residuals about R.....but I am not at all sure.....
I see that Dale has already given his usual impressive answer. Let me just
toss
some trafe in.

Think about marginal vs. conditional in the same way you think about
'marginal'
when using PROC FREQ. It's *analogous* to an average of the possible
conditional
means - but it is not an unbiased estimator, because of the z'_i*gammahat
part. So 'conditional' goes with 'conditioning on the subject'.

You can convince yourself that the two words actually make sense, if you try
hard
enough. :-)

"I dare say you haven't had much practice," said the queen. "When I was your
age, I always did it for half an hour a day. Why, sometimes I've believed as
many
as six impossible things before breakfast." - Lewis Carroll

David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330

_________________________________________________________________