Are you learning Multilevel Models? Do you feel ready? Or in over your head?
It’s a very common analysis to need to use. I have to say, learning it is not so easy on your own. The concepts of random effects are hard to wrap your head around and there is a ton of new vocabulary and notation. Sadly, this vocabulary and notation is not consistent across articles, books, and software, so you end up having to do a lot of translating.
When you hear about multilevel models or mixed models, you very often think of a nested design. Level 1 units nested in Level 2 units, which are in turn possibly nested in Level 3 units. But these variables that define the units and that become random factors in the model can, in fact, be crossed with each other, not nested.
Mixed models with crossed random factors are a little trickier to wrap your head around than mixed models with nested random factors. They still involve some nesting. But they’re not harder to analyze and they are quite common in many fields. Recognizing when you have one and knowing how to analyze the data when you do are important statistical skills.
The Nested Multilevel Design
Let’s start by reviewing the more common design: nested. The most straightforward use of Mixed Models is when observations are clustered or nested in some higher group.
It’s also so common that it often has its own name: multilevel model.
Examples include studies where patients share the same doctor, plants grow in the same field, or participants respond to multiple experimental conditions.
The units of observation at Level 1 (patient, plant, response) are clustered at Level 2 (doctor, field, or participant). This makes the responses from the same cluster correlated.
In these models, the Level 2 cluster is not something you’re interested in testing hypotheses about. It’s what we call a “blocking factor.” Even so, you need to control for its effects.
If the researcher would like to generalize the results to all doctors, fields, or participants, these clustering variables are random factors. You account for and measure its effects through random intercepts and/or adding random slopes across this factor for any level 1 predictor.
The observations of the dependent variable are always measured on the Level 1 unit (the patient, plant, or time point). Predictor variables (fixed effects) can be measured at either Level 1 or Level 2. For example, number of years of experience of a doctor would be at Level 2, measured for each doctor. But patient age would be measured at Level 1, measured for each patient.
You assume the values of the response variable within cluster are are correlated, but the observations between clusters are independent.
A third level (or more) is possible as well. This would happen if each doctor sees all their patients at one of four hospitals or each field has only one of 5 species.
The Crossed Multilevel Design
In one kind of 2-level design, there is not one random factor at Level 2, but two crossed factors. Each is a different random factor and they’re crossed with each other.
Each observation at Level 1 is nested in the combination of these two random factors. These models need to be specified correctly to capture the effects of both random factors at Level 2.
Here are the same examples with crossed random factors:
Every patient (Level 1) sees their Doctor (Random Factor at Level 2) at one of four Hospitals (Random Factor at Level 2) for a study comparing a new drug treatment for diabetes to an old one.
Each doctor sees patients at each of the hospitals. That means Hospital and Doctor are crossed. (If each doctor worked at only one hospital, doctor would be nested within Hospital). Patient responses vary across doctors and hospitals.
Because each Patient sees a single doctor at a single hospital, patients are nested in the combination of Doctor and Hospital.
The response is measured at Level 1–the patient. Predictors can occur at Level 1 (age, diet) or either Level 2 factor (years of practice by doctor, size of hospital).
The analysis would need to include, at a minimum, a random intercept for Doctor and a random intercept for Hospital.
An agricultural study is studying plants in 6 fields.
While there are many species of plants in each field, the researcher randomly chooses 5 species to be in the study. Each of the 5 species is found in every field.
Each individual plant (Level 1 unit) grows within one combination of species and field. Since every species is in every field, Species and Field are crossed at Level 2.
The response (nitrogen uptake) is measured at Level 1–the plant. Predictors can occur at Level 1 (height of plant) or either Level 2 factor (type of fertilizers applied to the field, whether the species is native or introduced).
In a social psychology experiment on first impressions, subjects rate statements that describe behaviors done by a fictional person, Bob.
On each trial, subjects rate whether or not they find Bob’s behavior friendly. The response time of the rating is recorded. Trial is the Level 1 unit.
Each subject sees the same 10 friendly and 10 unfriendly behaviors. The behaviors are not in themselves of interest to the experimenter, but are representative of all friendly and unfriendly behaviors that Bob could perform.
Because responses to the same behavior tend to be similar, it is necessary to control for their effects. After all, even within friendly behaviors, some (giving a gift) may be generally rated more friendly than others (holding a door open). Each trial of the experiment (Level 1) is nested within the combination of Subject and Behavior, which are both random factors at Level 2.
Subject and Behavior are crossed at Level 2 since every Subject rates every Behavior. The response is measured at Level 1–the trial. Predictors can occur at Level 1 (a distractor occurs on some trials) or either Level 2 factor (Behavior is friendly or not, Subject is put into positive, neutral, or negative mood).
Luckily, standard mixed modeling procedures such as SAS Proc Mixed, SPSS Mixed, Stat’s mixed, or R’s lmer can all easily run a mixed model with crossed effects model. (R’s lme can’t do it).
However, I’ve also seen issues with software that is designed specifically for Multilevel (aka Nested) designs. It assumes that all random factors are nested within each other. For example, a member was once trying to use a software designed for estimating sample sizes in multilevel models. It would only allow one random factor at level 2. So that software just didn’t work for that design.
At a minimum, each random factor needs a random intercept. The random factor itself is defined as the “subject” in the random part of the mixed model. You need two. You don’t need to specify to the software that the two random factors are crossed. With the data in long format, your software can tell.
Where it gets tricky is when deciding which random slopes you can include in the model. Each random factor can potentially have random slopes in addition to random intercepts. But this depends on the specific design of the study.
And of course, a study design can get even more complex. You could have more than the two random factors than we’ve talked about here. And they can be crossed or nested with each other.
When you learned analysis of variance (ANOVA), it’s likely that the emphasis was on the ANOVA table, with its Sums of Squares and F tests, followed by a post-hoc test. But ANOVA is quite flexible in how it can compare means. A large part of that flexibility comes from its ability to perform many types of statistical contrast.
That F test can tell you if there is evidence your categories are different from each other, which is a start. It is, however, only a start. Once you know at least some categories’ means are different, your next question is “How are they different?” This is what a statistical contrast can tell you.
What is a Statistical Contrast?
A statistical contrast is a comparison of a combination of the means of two or more categories. In practice, they are usually performed as a follow up to the ANOVA F test. Most statistical programs include contrasts as an optional part of ANOVA analysis. (more…)
In part 2 of this series, we got started on the various menus in Stata. This post covers an important menu that you’ll probably use often: the graphics menu.
What’s in the Graphics menu
The graphics menu provides an impressive variety of options for creating just about any graph you might need.
Take a look at the menu. It includes everything from univariate graphs like bar charts and pie charts to more complex, multivariate plots. Go ahead and explore some of the graphs available in the menu.
A comprehensive resource for a full understanding of the graphics you can do in Stata is the Stata Graphics Reference Manual, which is a free pdf download from the Stata web site. At nearly 800 pages, though, it’s not a quick read (it is excellent, though!).
A much quicker read is the Stata Data Visualization Cheat Sheet. Pages 5 – 6.
Browsing this two-page resource will tell you a lot about what you can do in Stata graphics. This includes not only which kinds of graphs you can create, but how to customize a graph’s appearance, apply themes, and save plots.
But first let’s explore how easy it is to create a simple, but customized plot using only the menus.
An Example of creating a Scatter plot using menus
To show an example, we’ll use the auto data. If you haven’t loaded up the data in your current session, type the following into your command line
Note that you could also open this data set using the File menu, but this is a command that is so simple, it’s faster to just type it into the command line.
As you’ll see, every time you use the menus, Stata fills in the associated commands for you into the command line
Now say we want to make a scatter plot with price on the y axis, and mpg on the x axis, but only for observations where the gear ratio is less than 3. We want this graph to have red triangles representing points, and we want it to have informative titles:
We click on Graphics -> Twoway graph. In the plots window click Create and select Basic plots -> Scatter.
Choose price as the y variable and mpg as the x variable; don’t press accept yet.
Under Marker properties choose Triangle as the symbol and Red as the color. Also notice you can also change the size or opacity of points or mark particular observations.
Click accept, then click accept on the next page.
Under the if/in tab, type “
gear_ratio<3” so only observations with a gear ratio less than 3 are plotted; click on Y axis.
Under the Y axis tab type the title “Price”, and under the X axis tab type the title “MPG”. Note how you can also change properties of the axis.
In the Titles tab type an appropriate title for the graph – I chose “Price and Mileage”.
We don’t want to see a legend so in the Legend tab choose “hide legend”.
You can now press ok and should see the following graph:
You’ll also see that Stata put this code put to the console:
twoway (scatter price mpg, mcolor(red) msymbol(triangle)) if gear_ratio<3, ytitle(`"Price"') xtitle(`"MPG"') title(`"Price and Mileage"') legend(off)
Now you know how to make this graph and similar ones with syntax as well! If you’re ever having trouble creating a certain chart using code, the graph menu can provide an easier way to select the options you want.
Note: when you make plots in Stata menus, make sure to always make a new plot rather than layering on an old one. If you press create when you already have a plot selected, your new scatterplot will layer on top of the old one.
by James Harrod
About the Author:
James Harrod interned at The Analysis Factor in the summer of 2023. He plans to continue into a career as an actuary, and hopes to continue finding interesting ways of educating people about statistics. James is well-versed in R and Stata programming and enjoys teaching the intuition behind common statistical methods. James is a 2023 graduate of the University of Rochester with bachelor’s degrees in Statistics and Economics.