Discussion:
Offset statements in PROC GENMOD
(too old to reply)
Richard Van Dorn
2009-10-13 17:26:40 UTC
Permalink
Hello all. I have a question regarding the use of an offset statement.

Here’s the scenario: subjects were interviewed two times, however, the
number of days that passed between the two interviews varied (e.g., 500 to
1000 days). The outcome of interest was violence (yes or no). However, those
with 1000 days between interviews would have more opportunities to be
violent than those with fewer days between interviews. In this example, is
it more correct to use an offset statement (i.e., the log of the number of
days) as opposed to including a covariate that measured the number of days
between the interviews (this would be in a GENMOD model with a binomial
distribution).

My original intent was to use the offset statement; however, I wasn’t sure
if that was “more” correct than including a covariate in the model (and I
couldn’t find any literature that was specific to this question). Thank you
for any input that you can provide regarding this!

Richard
Ryan
2009-10-13 19:14:05 UTC
Permalink
Post by Richard Van Dorn
Hello all. I have a question regarding the use of an offset statement.
Here’s the scenario: subjects were interviewed two times, however, the
number of days that passed between the two interviews varied (e.g., 500 to
1000 days). The outcome of interest was violence (yes or no). However, those
with 1000 days between interviews would have more opportunities to be
violent than those with fewer days between interviews. In this example, is
it more correct to use an offset statement (i.e., the log of the number of
days) as opposed to including a covariate that measured the number of days
between the interviews (this would be in a GENMOD model with a binomial
distribution).
My original intent was to use the offset statement; however, I wasn’t sure
if that was “more” correct than including a covariate in the model (and I
couldn’t find any literature that was specific to this question). Thank you
for any input that you can provide regarding this!
Richard
Richard,

As suggested by Peter McCullagh and John A. Nelder in "Generalized
Linear Models" (2nd edition), you should probably first estimate the
coefficient of log(# of days). Assuming it is close to 1 (taking into
consideration the standard error) AND it makes intuitive sense to
treat it as an offset, then it is probably safe to treat it as such.

HTH,

Ryan
Shawn Haskell
2009-10-14 14:20:18 UTC
Permalink
Post by Richard Van Dorn
Hello all. I have a question regarding the use of an offset statement.
Here’s the scenario: subjects were interviewed two times, however, the
number of days that passed between the two interviews varied (e.g., 500 to
1000 days). The outcome of interest was violence (yes or no). However, those
with 1000 days between interviews would have more opportunities to be
violent than those with fewer days between interviews. In this example, is
it more correct to use an offset statement (i.e., the log of the number of
days) as opposed to including a covariate that measured the number of days
between the interviews (this would be in a GENMOD model with a binomial
distribution).
My original intent was to use the offset statement; however, I wasn’t sure
if that was “more” correct than including a covariate in the model (and I
couldn’t find any literature that was specific to this question). Thank you
for any input that you can provide regarding this!
Richard
I don't think either approach is wrong, so maybe it's a question of
which makes more sense for your study. Would "# days" be an
interesting covariate (ie, "independent" variable) predicting violent
acts? If not, it would porbably serve better as an offset term, thus
making violent acts more of a rate than occurence or simple count.
Using "# days" as an offset term inherently assumes that it makes a
difference, so if you are willing to blindly conceed that, it sounds
to me like it may not be a very interesting covariate.

My experience with offset variables is pretty much just from wildlife
surveys where you have count data, but distance surveyed was different
among surveys for various reasons, so ln(distance surveyed) served as
the offset to make a sighting rate, rather than simple counts. Maybe
you'll get a response from someone with different experience.
Dale McLerran
2009-10-14 16:54:11 UTC
Permalink
Subject: Re: Offset statements in PROC GENMOD
Date: Tuesday, October 13, 2009, 12:14 PM
Post by Richard Van Dorn
Hello all. I have a question regarding the use of an
offset statement.
Post by Richard Van Dorn
Here’s the scenario: subjects were interviewed two
times, however, the
Post by Richard Van Dorn
number of days that passed between the two interviews
varied (e.g., 500 to
Post by Richard Van Dorn
1000 days). The outcome of interest was violence (yes
or no). However, those
Post by Richard Van Dorn
with 1000 days between interviews would have more
opportunities to be
Post by Richard Van Dorn
violent than those with fewer days between interviews.
In this example, is
Post by Richard Van Dorn
it more correct to use an offset statement (i.e., the
log of the number of
Post by Richard Van Dorn
days) as opposed to including a covariate that
measured the number of days
Post by Richard Van Dorn
between the interviews (this would be in a GENMOD
model with a binomial
Post by Richard Van Dorn
distribution).
My original intent was to use the offset statement;
however, I wasn’t sure
Post by Richard Van Dorn
if that was “more” correct than including a
covariate in the model (and I
Post by Richard Van Dorn
couldn’t find any literature that was specific to
this question). Thank you
Post by Richard Van Dorn
for any input that you can provide regarding this!
Richard
Richard,
As suggested by Peter McCullagh and John A. Nelder in
"Generalized
Linear Models" (2nd edition), you should probably first
estimate the
coefficient of log(# of days). Assuming it is close to 1
(taking into
consideration the standard error) AND it makes intuitive
sense to
treat it as an offset, then it is probably safe to treat it
as such.
HTH,
Ryan
Fitting the model with an offset parameter expressed as
log(T) makes sense when fitting a model in which count
data are assumed to follow a Poisson or negative binomial
distribution. Since the outcome here is a binary response,
presumably modeled employing a logit transformation, I
question whether a log transformation applied to the time
variable would be appropriate.

Still, I believe the general approach of employing a
function of time first as a covariate is indeed the best
approach. It is a strong assumption to restrict the
coefficient of a variable to be 1. That is exactly what
an offset parameter does.

I would even suggest that the OP consider alternate
parameterizations of time - that the response may not be
linear in time (or log(time)). It may be that a quadratic
function is required. Alternate parameterizations of the
time variable can be examined by constructing a likelihood
ratio test when higher order terms are added to the model
or be examining which parameterization, time or log(time)
or sqrt(time) or ... produces the best likelihood (smallest
-2LL). Yet another way to examine whether a different
parameterization of a continuous predictor variable might
be necessary is to use the ASSESS statement along with
ODS graphics. Consult the documentation of the GENMOD
procedure for syntax and examples of using the ASSESS
statement.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: ***@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

Loading...