One of the difficult decisions to make in mixed modeling is deciding which factors are fixed and which are random. Correctly specifying the fixed and random factors of the model is vital to obtain accurate analyses.
The definitions in many texts often do not help with decisions to specify factors as fixed or random, since textbook examples are often artificial and hard to apply.
Furthermore, the same factor can often be considered fixed or random, depending on the objective; this article outlines a different way to think about fixed and random factors.
Consider an experiment that examines beetle damage on cucumbers. The experiment is replicated at five farms and on four fields at each farm.
There are two varieties of cucumbers, and beetle damage is assessed on each of 50 plants at the end of the season. The researcher is interested in comparing differences in how much damage the two varieties sustain.
The experiment then has the following factors: VARIETY, FARM, and FIELD.
Fixed factors can be thought of in terms of differences.
The effect of a categorical fixed factor is defined by differences from the overall mean, and the effect of a continuous fixed factor (usually called a covariate) is defined by its slope–how the mean of the dependent variable differs with differing values of the factor.
The output for fixed factors provides estimates for meandifferences or slopes.
Conclusions regarding fixed factors are particular to the values of these factors. For example, if one variety of cucumber is found to suffer significantly less damage than the other, this says nothing about cucumber varieties that were not tested.
Random factors, on the other hand, are defined by a distribution and not by differences.
The values of a random factor are assumed to be chosen from a population with a normal distribution with a certain variance.
The output for a random factor is an estimate of this variance and not a set of differences from a mean. Conclusions regarding random factors should be expressed in terms of variance.
For example, we may find that the variance among fields makes up a certain percentage of the overall variance in beetle damage.
Situations that indicate fixed factors:
 The factor is the primary treatment that the researcher wants to compare. In our example, VARIETY is definitely fixed as the researcher wants to compare the mean beetle damage on the two varieties.
 The factor is a secondary control variable, and the researcher wants to control for differences in this factor. If these farms were specifically chosen for some feature they had, such as specific soil types or topographies that may affect beetle damage, and if the researcher would like to compare the farms as representatives of those soil types, then FARM should be fixed.
 The factor has only two values. Even if everything else indicates that a factor should be random, if it has only two values, the variance cannot be calculated, and it should be fixed.
Situations that indicate random factors:

 The researcher is interested in quantifying how much of the overall variation to attribute to this factor. If the researcher was interested in how much of the variation in beetle damage was attributable to the farm at which the damage took place, FARM would be random.
 The researcher is not interested in knowing which means differ, but wants to account for the variation in this factor. If the farms were chosen at random, FARM should be random.
This choice of the specific farms involved in the study is key. If you can rerun the study using different specific farms–different values of the Farm factor–and still be able to draw the same conclusions, then Farm should be random. However, if you had wanted to compare or control for these particular farms, then Farm would be fixed.
 The researcher would like to generalize the conclusions about this factor to the whole population. There is nothing about comparing these specific fields that is of interest to the researcher. Rather, the researcher wants to generalize the results of this experiment to all fields, so FIELD is random.
 Any interaction with a random factor is also random.
How the factors of a model are specified can have great influence on the results of the analysis and on the conclusions drawn.
{ 2 comments… read them below or add one }
Thanks for your article. I have one question about one of the situations that indicate a fixed factor. If a researcher would like to compare farms as representatives of select soil types, would it not make more sense and reduce confusion if the researcher called the factor what it represents, in this case, soil type, rather than to continue calling the factor FARM? The levels, of course, should be changed to appropriate values as well, for example, farm=A might now be soil type=clay loam and farm = B might now be soil type = sand, etc.
Hi Corey,
Certainly. In that design, Farm and Soil Type would be confounded, in fact. The reality is that those farms will differ on other factors as well, and you may or may not have measured those.
{ 1 trackback }