One of the most confusing things about mixed models arises from the way it’s coded in most statistical software. Of the ones I’ve used, only HLM sets it up differently and so this doesn’t apply.

But for the rest of them—SPSS, SAS, R’s lme and lmer, and Stata, the basic syntax requires the same pieces of information.

1. The dependent variable

2. The predictor variables for which to calculate fixed effects and whether those are categorical or continuous. Each software has a different way of specifying them, but they all need to know that.

3. The predictor variables for which to calculate random effects, the level at which to calculate those effects, and if there are multiple random effects, the covariance structure of those effects.

The confusion comes in when we specify the same predictor in both the fixed and random parts. The syntax makes it look like we’re specifying the same predictor as both fixed and random.

But we’re not. It’s not only okay, it’s often the only way to write the model appropriately.

Let’s take a very simple example. This is the same model I use in my free webinar Random Intercept and Random Slope Models. If you haven’t seen it and want more detail, you can get the recording here.

The basic idea, though, is we’re comparing the economic growth over 5 decades between Rural and Metropolitan counties.

Economic growth is the outcome, measured in thousands of jobs (JobsK). JobsK is continuous.

County indicates from which county the observations come. Each county has up to 5 measurements, and this is why we need the mixed model—to account for the inherent correlation among the multiple observations from the same county. County is categorical.

Time indicates number of decades since 1960, and ranges from 0 to 4. Treated as continuous.

And Rural is an indicator (aka dummy) variable for whether the county is rural. Rural is categorical.

### SPSS

`MIXED JobsK BY Rural WITH Time`

`/FIXED =Rural Time Rural*Time`

`/RANDOM Intercept Time|Subject(COUNTY) covtype(UN).`

### SAS

`Proc mixed;`

`Class rural county;`

`Model JobsK=rural|time/solution;`

`Random int time/subject=county type=un;`

`run;`

### R’s lme

`>model<-lme(JobsK~rural*time, random=~time|County,data=countylong, na.action=na.omit)`

### Stata

`mixed JobsK c.Time##Rural||County:Time,variance reml cov(un)`

You can see here that Time is listed in the fixed portion of the model, which appears in SPSS’s Fixed statement, SAS’s model statement, before the || in Stata, and before the comma in R.

And it’s also listed in the random portion, which appears in SPSS’s and SAS’s Random statement, after the || in Stata, and after the comma in R.

It looks like we’re treating Time as both fixed and random. If we’re not, then what the heck are we doing?

The fixed portion is doing exactly what a linear model does. It fits an overall regression line over time. Since we have both Rural and a Rural*Time interaction, it actually fits two regression lines—one for the rural counties and one for the metropolitan counties. The coefficient we get for Rural measures the difference in their intercepts and the coefficient for the interaction measures the difference in their slopes.

Just to emphasize: This fixed effect for time measures the overall effect for time across all counties. It’s often called the population average effect, because it’s an estimate of the effect of time for the population of all counties.

Okay, so what is that random effect of time? Aren’t we making Time random as well as fixed?

As I said earlier, no.

A key part of the random statement is the identification of the Subject. In this example, it’s County. It’s really County that is a random factor in the model and we’re specifying two random effects for those Counties—an intercept and a slope over Time.

The random slope for Time at the County level means that the slope across time varies across Counties. In other words, the effect of Time on Jobs (the slope) is different for different values of County.

If you are thinking that it sounds like we’re really fitting an interaction between Time and County, then you would be correct. We are.

Because this slope is a random effect, we don’t measure this interaction through a regression coefficient as we would if it were fixed.

Instead, we measure how much each County’s slope differs from the population average slope, then find the variance of these difference measures. That’s the variance estimate for the random slope.

If that variance comes out to 0, it indicates that the slope of Time on Jobs is actually the same for all counties—they don’t vary from each other.

Now of course, we’re not doing these steps directly. But that is basically what the model is doing, through a lot of complicated statistical algorithms.

So, to reiterate the central point: Time in the fixed statement measures the overall effect of time on jobs across all counties. Time in the random statement measures the variance in the effects of time on jobs across counties. It looks the same in the syntax, but it’s actually a very different concept.

{ 10 comments… read them below or add one }

Hi Karen,

This was very intuitive and helpful. Thanks for taking time and making difficult topics easy to understand.

Hi Karen, I have a question concerning this topic. When I build my model, would I introduce the random slopes last or first, compared to the fixed effects? In my example the final model itself works, but if I introduce factor1 as a random slope first, adding it as a fixed factor after that does not significantly increase model fit. If I add it as a fixed factor first, it significantly increases model fit (p<.000).

I would be interested in the justification of using a model like that. Which way would you argue?

Hey Karen,

you are producing high quality content. I’m really glad I found this website. Same goes for the webinars.

Do you think you could write a few lines about covariance structures in repeated measures mixed models some day? That’d be really great 🙂

Patrick

Hi Patrick,

That is actually a difficult topic because there are two different places in mixed models where you specify covariance structures–the G matrix and the R matrix. Unfortunately, the structures that make sense in one don’t make sense in the other. And also unfortunately, I have found that understanding the difference between them is the biggest stumbling block for understanding mixed models in repeated measures. We spend hours on it in my Analyzing Repeated Measures workshop.

People have a lot of lightbulb moments in that workshop because we’re so deliberate about how we teach it. But it really does take hours to explain.

All that said, I have written a few things on the topic here:

https://www.theanalysisfactor.com/covariance-matrices/

https://www.theanalysisfactor.com/mixed-models-repeated-measures-g-side-r-side/

https://www.theanalysisfactor.com/unstructured-covariance-matrix-when-it-does-and-doesn%e2%80%99t-work/

This is very helpful! Thank you!

Hello.

Is it possible to have a model such as (in lme4 notation)

Y ~ x + ID + (1|ID)

Where a variable appears both as the fixed effect and as the subject of the random effect?

Hi Skan,

No. That is specifying the variable as both fixed and random. What you can do is include a level-1 variable in both the fixed portion and as a random slope across subject.

Thanks for a very nice answer to the question!!

Hi Karen,

Thank you very much for the explanation. However, one question always crops up in my mind: I have a response variable which is sales at sku level. Suppose in a random effect model we are trying to get random effects for a media variable on different skus (10 skus) using SAS. But the covariance parameter is not significant and hence there is no random effect. Then we try interaction effect sku*Media_TV in the model statement in SAS. If it comes out to be significant, we have interaction effect and hence different slope coefficients for different skus. How do we interpret this and what is the difference between interaction effect and random effect here, from a business point of view?

Pravata,

I’m not following your design well enough to actually give you advice.

I will say though, not to use the p-values for covariance parameter estimates. They’re considered unstable unless you have an enormous data set.