Mixed Models: Can you specify a predictor as both fixed and random?

One of the most confusing things about mixed models arises from the way it’s coded in most statistical software.  Of the ones I’ve used, only HLM sets it up differently and so this doesn’t apply.

But for the rest of them—SPSS, SAS, R’s lme and lmer, and Stata, the basic syntax requires the same pieces of information.

1.       The dependent variable

2.       The predictor variables for which to calculate fixed effects and whether those are categorical or continuous.  Each software has a different way of specifying them, but they all need to know that.

3.       The predictor variables for which to calculate random effects, the level at which to calculate those effects, and if there are multiple random effects, the covariance structure of those effects.

The confusion comes in when we specify the same predictor in both the fixed and random parts.  The syntax makes it look like we’re specifying the same predictor as both fixed and random.

But we’re not. It’s not only okay, it’s often the only way to write the model appropriately.

Let’s take a very simple example.  This is the same model I use in my free webinar Random Intercept and Random Slope Models.  If you haven’t seen it and want more detail, you can get the recording here.

The basic idea, though, is we’re comparing the economic growth over 5 decades between Rural and Metropolitan counties.

Economic growth is the outcome, measured in thousands of jobs (JobsK). JobsK is continuous.

County indicates from which county the observations come.  Each county has up to 5 measurements, and this is why we need the mixed model—to account for the inherent correlation among the multiple observations from the same county. County is categorical.

Time indicates number of decades since 1960, and ranges from 0 to 4. Treated as continuous.

And Rural is an indicator (aka dummy) variable for whether the county is rural. Rural is categorical.

SPSS

MIXED JobsK BY Rural WITH Time
/FIXED =Rural Time Rural*Time
/RANDOM Intercept Time|Subject(COUNTY) covtype(UN).

SAS

Proc mixed;
Class rural county;
Model JobsK=rural|time/solution;
Random int time/subject=county type=un;
run;

R’s lme

>model<-lme(JobsK~rural*time, random=~time|County,data=countylong, na.action=na.omit)

Stata

mixed JobsK c.Time##Rural||County:Time,variance reml cov(un)

You can see here that Time is listed in the fixed portion of the model, which appears in SPSS’s Fixed statement, SAS’s model statement, before the || in Stata, and before the comma in R.

And it’s also listed in the random portion, which appears in SPSS’s and SAS’s Random statement, after the || in Stata, and after the comma in R.

It looks like we’re treating Time as both fixed and random.  If we’re not, then what the heck are we doing?

The fixed portion is doing exactly what a linear model does.  It fits an overall regression line over time.  Since we have both Rural and a Rural*Time interaction, it actually fits two regression lines—one for the rural counties and one for the metropolitan counties.  The coefficient we get for Rural measures the difference in their intercepts and the coefficient for the interaction measures the difference in their slopes.

Just to emphasize: This fixed effect for time measures the overall effect for time across all counties.  It’s often called the population average effect, because it’s an estimate of the effect of time for the population of all counties.

Okay, so what is that random effect of time?  Aren’t we making Time random as well as fixed?

As I said earlier, no.

A key part of the random statement is the identification of the Subject.  In this example, it’s County.  It’s really County that is a random factor in the model and we’re specifying two random effects for those Counties—an intercept and a slope over Time.

The random slope for Time at the County level means that the slope across time varies across Counties.  In other words, the effect of Time on Jobs (the slope) is different for different values of County.

If you are thinking that it sounds like we’re really fitting an interaction between Time and County, then you would be correct. We are.

Because this slope is a random effect, we don’t measure this interaction through a regression coefficient as we would if it were fixed.

Instead, we measure how much each County’s slope differs from the population average slope, then find the variance of these difference measures.  That’s the variance estimate for the random slope.

If that variance comes out to 0, it indicates that the slope of Time on Jobs is actually the same for all counties—they don’t vary from each other.

Now of course, we’re not doing these steps directly.  But that is basically what the model is doing, through a lot of complicated statistical algorithms.

So, to reiterate the central point: Time in the fixed statement measures the overall effect of time on jobs across all counties.  Time in the random statement measures the variance in the effects of time on jobs across counties.  It looks the same in the syntax, but it’s actually a very different concept.

 

Reader Interactions

Comments

  1. Carrie Gardullo says

    How exactly do we interpret the random intercepts given that there is an interaction in the fixed effects? For example, if the random intercept for a rural county is significantly more positive in the BLUPS, do we say that it has a more positive intercept relative to all other counties, or is it just positive relative to other rural counties?

      • Carrie Gardullo says

        thanks! Is this because rural is the reference level? How would we interpret the random intercept for a metropolitan county if it is significantly more positive in the BLUPS? It’s just a bit confusing to figure out what the population-level intercept is that we are comparing to the random intercept of each county.

          • Carrie Gardullo says

            Thank you! How would I extract the population-level intercept for rural and the separate population-level intercept for metro counties out of the model? I would like to add the population-level intercept for rural counties to the rural county BLUPS and the population-level intercept for metro counties to the metro county BLUPS. However, so far I can only see how to get the population-level intercept for the reference level (rural).

  2. Leo says

    Karen, thanks for this explanation. So, are there situations where the same variable needs not appear in both FE and RE? What if I have a level-2 predictor that does not vary at level 1 at all (eg. state-level policy attributes that apply to everybody in the same state)? Thank you.

  3. Dasha says

    Hi Karen, I am wondering whether it is possible that, for instance, the fixed time effect does not turn out to be significant but the random effect for time does? in such a case, would it be ok to state the model as follows? Thank you!

    model<-lme(JobsK~rural, random=~time|County,data=countylong)

    • Karen Grace-Martin says

      Hi Dasha,

      Totally possible.

      The fixed effect is testing whether the average effect for all counties (for example for Time) = 0. The random effect is testing whether the effect of time is the same for all counties (variance among the counties=0).

  4. Salah Lotfi says

    Hi Karen,
    This was very intuitive and helpful. Thanks for taking time and making difficult topics easy to understand.

  5. Alexander says

    Hi Karen, I have a question concerning this topic. When I build my model, would I introduce the random slopes last or first, compared to the fixed effects? In my example the final model itself works, but if I introduce factor1 as a random slope first, adding it as a fixed factor after that does not significantly increase model fit. If I add it as a fixed factor first, it significantly increases model fit (p<.000).
    I would be interested in the justification of using a model like that. Which way would you argue?

  6. Patrick says

    Hey Karen,

    you are producing high quality content. I’m really glad I found this website. Same goes for the webinars.

    Do you think you could write a few lines about covariance structures in repeated measures mixed models some day? That’d be really great 🙂

    Patrick

  7. skan says

    Hello.

    Is it possible to have a model such as (in lme4 notation)
    Y ~ x + ID + (1|ID)
    Where a variable appears both as the fixed effect and as the subject of the random effect?

    • Karen Grace-Martin says

      Hi Skan,
      No. That is specifying the variable as both fixed and random. What you can do is include a level-1 variable in both the fixed portion and as a random slope across subject.

  8. PRAVATA KUMAR DASH says

    Hi Karen,

    Thank you very much for the explanation. However, one question always crops up in my mind: I have a response variable which is sales at sku level. Suppose in a random effect model we are trying to get random effects for a media variable on different skus (10 skus) using SAS. But the covariance parameter is not significant and hence there is no random effect. Then we try interaction effect sku*Media_TV in the model statement in SAS. If it comes out to be significant, we have interaction effect and hence different slope coefficients for different skus. How do we interpret this and what is the difference between interaction effect and random effect here, from a business point of view?

    • Karen Grace-Martin says

      Pravata,

      I’m not following your design well enough to actually give you advice.

      I will say though, not to use the p-values for covariance parameter estimates. They’re considered unstable unless you have an enormous data set.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.