# What is a Randomized Complete Block Design?

Designing experiments would always be simple if we could just randomly assign subjects to different treatment conditions with no other restrictions. Unfortunately, that doesn’t always work.

For example, there are many experimental situations where the subjects aren’t independent of each other. The subjects that are related to each other are combined into clusters called “blocks.” It can happen due to practicalities of running an experiment efficiently or you can intentionally plan it as a way to reduce random variance.

In either case, this is a randomized complete block design. It’s a great design to become familiar with because it will greatly expand your ability to create and analyze experiments.

### How It Works

When you have subjects that share characteristics with one another, it can sometimes be difficult to isolate those characteristics directly. This makes it hard to record them as additional variables. By identifying the subjects that are similar, you can still capture how those characteristics affect the outcome. Subjects that are similar are grouped into “blocks.”

From there, you can make treatment assignments so that you put subjects from the same block into different treatment groups.

Why different treatment groups? Suppose subjects from the same block were assigned to the same treatment group. You wouldn’t know if the difference between that group and other groups is because of the treatment or the block. By spreading out the subjects from each block, you average over the effects of their different characteristics. You can then accurately measure the differences due to the treatment.

### An Example

Often in agricultural experiments, plots of land are the subjects.

To randomly assign independent plots to each treatment condition is expensive. You have to travel more between plots to administer treatments and collect data. This requires more time and administrative work to even find the plots and get permission to use them. You need to sample many more plots to overcome natural variation among them.

A much more efficient design is to use fewer large plots of land that you divide into smaller plots for the purpose of the experiment.

Small plots that are part of the same large plot probably have things in common that they do not with other small plots. It could be the soil composition, the amount of sun and wind, the slope of the land, anything.

So you need to treat the large plots as blocks to account for any variation between them. Next, you assign the small plots within any one large plot to different treatment groups. You can do this be randomly assigning the small plots to the groups one large plot at a time.

Assigning one plot within a block to each treatment group ensures that the blocks cannot affect any differences in treatment.

This same efficiency of using clusters of subjects as blocks occurs in other fields. It might mean clusters of rivers in the same watershed; clusters of children in the same classroom; or clusters of fish in the same tank. Trying to find, randomize, and collect data from individual subjects from different watersheds, classrooms, or tanks is generally too resource intensive and logistically difficult.

### Analysis of Data from a Randomized Complete Block Design

It’s vital that you account for the blocks in the analysis. There are two general options.

The first is more traditional. You add blocks to the statistical model as an additional categorical variable, just like any other. The difference is that you would not test the effects of the blocks. The only purpose is to isolate the the block’ effect on the outcome. This way, you can better estimate the differences between the treatment groups.

The second method is to treat the blocks as random factors and use a mixed model. The idea is that each block has its own random effect on the outcome. This is added or subtracted from the overall average to adjust for the way a specific block affects the outcome. This is appropriate if you believe the blocks in the experiment represent a random sample from a larger population of blocks.

In the agricultural study example, the mixed model approach assumes the large plots of land available represent all plots of land where the treatment might be used in the future. Note that this method allows you to learn about the variance contributed by different large plots. This could be helpful for understanding the different results you might see when the treatment is put into practice.

### Final Notes

Although it can be tempting to ignore relationships among subjects, this is a bad idea! If you ignore it, at best you are not adjusting for real variability that you should account for. This will make the estimates of the standard errors of your treatment effects inaccurate.

In turn, this makes p-values and confidence intervals inaccurate.

At worst, randomization could coincidentally assign the treatment in some way that biases the results. This will lead to the wrong conclusions about the treatment.

Understanding the randomized complete block design will avoid this issue. Once you do, it is not too hard to execute and analyze.

Fixed and Random Factors in Mixed Models
One of the hardest parts of mixed models is understanding which factors to make fixed and which to make random. Learn the important criteria to help you decide.