# Analysis of Variance and Covariance

## Articles at The Analysis Factor

• 3 Mistakes Data Analysts Make in Testing Assumptions in GLM - I know you know it--those assumptions in your regression or ANOVA model really are important. If they're not met adequately, all your p-values are inaccurate, wrong, useless. But, and this is a big one, the GLM is robust to departures from those assumptions. Meaning, they don't have to fit exactly to be accurate, right, useful.
• 3 Reasons Psychology Researchers should Learn Regression - Why should you, as a researcher in Psychology, Education, or Agriculture, who is trained in ANOVA, need to learn linear regression? There are 3 main reasons.
• 3 Situations when it makes sense to Categorize a Continuous Predictor in a Regression Model - But it can be very useful and legitimate to be able to choose whether to treat an independent variable as categorical or continuous. Knowing when it is appropriate and understanding how it affects interpretion of parameters allows the data analyst to find real results that might otherwise have been missed.
• 7 Practical Guidelines for Accurate Statistical Model Building - But if the point is to answer a research question that describes relationships, you're going to have to get your hands dirty. It's easy to say "use theory" or "test your research question" but that ignores a lot of practical issues. Like the fact that you may have 10 different variables that all measure the same theoretical construct, and it's not clear which one to use.
• A Comparison of Effect Size Statistics - Another set of effect size measures for categorical independent variables have a more intuitive interpretation, and are easier to evaluate. They include Eta Squared, Partial Eta Squared, and Omega Squared. Like the R Squared statistic, they all have the intuitive interpretation of the proportion of the variance accounted for.
• About Dummy Variables in SPSS Analysis - I know that if I included 5 dummy location variables (6 locations in total, with A as the reference group) in 1 block of the regression analysis, the result would be based on the comparison with the reference location. Then what if I put 6 dummies (for example, the 1st dummy would be "1" for A location, and "0" for otherwise) in 1 block? Will it be a bug? If not, how to interpret the result?
• Actually, you can interpret some main effects in the presence of an interaction - One of those “rules” about statistics you often hear is that you can’t interpret a main effect in the presence of an interaction. Stats professors seem particularly good at drilling this into students’ brains. Unfortunately, it’s not true. At least not always.
• Analyzing Pre-Post Data with Repeated Measures or ANCOVA - This kind of situation happens all the time, in which a colleague, a reviewer, or a statistical consultant insists that you need to do the analysis differently. Sometimes they're right, but sometimes, as was true here, the two analyses answer different research questions.
• Anatomy of a Normal Probability Plot - Across the bottom are the observed data values, sorted lowest to highest. You can see that just like on the histogram, the values range from about -2.2 to 2.2. (Note, these are standardized residuals, so they already have a mean of 0 and a standard deviation of 1. If they didn’t, the plot would standardize them before plotting).
• ANCOVA Assumptions: When Slopes are Unequal - Of course, the main effect for condition in this full model with the interaction will test the same thing, as well as give you additional information at different ages. So your second option is:
• Approaches to Repeated Measures Data: Repeated Measures ANOVA, Marginal, and Mixed Models - In a marginal model, we can directly estimate the correlations among each individual's residuals. (We do assume the residuals across different individuals are independent of each other). We can specify that they are equally correlated, as in the RM ANOVA, but we're not limited to that assumption. Each correlation can be unique, or measurements closer in time can have higher correlations than those farther away. There are a number of common patterns that the residuals tend to take.
• April 2013 Member Webinar: Hierarchical Regressions - Hierarchical regression is a very common approach to model building that allows you to see the incremental contribution to a model of sets of predictor variables. Popular for linear regression in many fields, the approach can be used in any type of regression model — logistic regression, linear mixed models, or even ANOVA. In this webinar, we’ll go over the concepts and steps, and we’ll look at how it can be useful in different contexts.
• Assumptions of Linear Models are about Residuals, not the Response Variable - I recently received a great question in a comment about whether the assumptions of normality, constant variance, and independence in linear models are about the residuals or the response variable. The asker had a situation where Y, the response, was not normally distributed, but the residuals were.
• Beyond Median Splits: Meaningful Cut Points - It's true that median splits are arbitrary, but there are situations when it is reasonable to make a continuous numerical variable categorical.
• Can Likert Scale Data ever be Continuous? - A very common question is whether it is legitimate to use Likert scale data in parametric statistical procedures that require interval data, such as Linear Regression, ANOVA, and Factor Analysis. A typical Likert scale item has 5 to 11 points that indicate the degree of agreement with a statement, such as 1=Strongly Agree to 5=Strongly […]
• Checking Assumptions in ANOVA and Linear Regression Models: The Distribution of Dependent Variables - The distributional assumptions for linear regression and ANOVA are for the distribution of Y|X -- that's Y given X. You have to take out the effects of all the Xs before you look at the distribution of Y.
• Checking the Normality Assumption for an ANOVA Model - The assumptions are exactly the same for ANOVA and regression models. The normality assumption is that residuals follow a normal distribution. You usually see it like this: ε~ i.i.d. N(0, σ²) But what it's really getting at is the distribution of Y|X.
• Concepts in Linear Regression you need to know before learning Multilevel Models - Learning Multilevel Models is hard enough. Even worse if you don't have a good background in linear regression. This article outlines 4 must-know concepts.
• Confusing Statistical Term #6: Factor - Factor is tricky much in the same way as hierarchical and beta, because it too has different meanings in different contexts. Factor might be a little worse, though, because its meanings are related. In both meanings, a factor is a variable. But a factor has a completely different meaning and implications for use in two different contexts. Factor analysis In factor analysis, a factor is an unmeasured, latent variable, that expresses itself through its relationship with other measured variables.
• Confusing Statistical Term #7: GLM - Like some of the other terms in our list--level and beta--GLM has two different meanings. It's a little different than the others, though, because it's an abbreviation for two different terms: General Linear Model and Generalized Linear Model. It's extra confusing because their names are so similar on top of having the same abbreviation.
• Confusing Statistical Terms #1: The Many Names of Independent Variables - Statistical models, such as general linear models (linear regression, ANOVA, mixed models) and generalized linear models (logistic, Poisson, proportional hazard regression, etc.) all have the same general form. On the left side of the equation is one or more response variables, Y. On the right hand side is one or more predictor variables, X, and […]
• Confusing Statistical Terms #3: Levels of a Factor in Multilevel Models Measured at a Nominal Level - It struck me today in answering a question that statisticians have not been very helpful to those trying to learn statistics in the way they name statistical terms. I can think of other examples (how many totally different concepts does alpha refer to in statistics?), but the term I was using today was levels. Specifically, […]
• Confusing Statistical Terms #5: Covariate - Covariate is a tricky term in a different way than hierarchical or beta, which have completely different meanings in different contexts. Covariate really has only one meaning, but it gets tricky because the meaning has different implications in different situations, and people use it in slightly different ways.  And these different ways of using the […]
• Continuous and Categorical Variables: The Trouble with Median Splits - A Median Split is one method for turning a continuous variable into a categorical one.  Essentially, the idea is to find the median of the continuous variable.  Any value below the median is put it the category “Low” and every value above it is labeled “High.” This is a very common practice in many social […]
• December 2013 Member Webinar: Interactions in ANOVA and Regression Models, Part 1 - There is something about interactions that is incredibly confusing. An interaction between two predictor variables means that one predictor variable affects a third variable differently at different values of the other predictor.
• Dummy Coding in SPSS GLM–More on Fixed Factors, Covariates, and Reference Groups, Part 1 - So the question is what to do with your categorical variables. You have two choices, and each has advantages and disadvantages. The easiest is to put categorical variables in Fixed Factors. SPSS will dummy code those variables for you, which is quite convenient if your categorical variable has more than two categories. However, there are some defaults you need to be aware of that may or may not make this a good choice. SPSS always makes the reference group the one that comes last alphabetically. So if the values you input are strings, it will be the one that comes last. If those values are numbers, it will be the highest one.
• Dummy Coding in SPSS GLM–More on Fixed Factors, Covariates, and Reference Groups, Part 2 - Yesterday’s post outlined one issue in deciding whether to put a categorical predictor variable into Fixed Factors or Covariates in SPSS GLM.  That issue dealt with how SPSS automatically creates dummy variables out of any variable in Fixed Factors. Another default to keep in mind is that SPSS will automatically create interactions between any and […]
• February 2013 Member Webinar: Types of Regression Models and When to Use Them - Linear, Logistic, Tobit, Cox, Poisson, Zero Inflated… The list of regression models goes on and on before you even get to things like ANCOVA or Linear Mixed Models. In this webinar, we will explore types of regression models, how they differ, how they’re the same, and most importantly, when to use each one.
• Five Advantages of Running Repeated Measures ANOVA as a Mixed Model - There are two ways to run a repeated measures analysis. The traditional way is to treat it as a multivariate test--each response is considered a separate variable. The other way is to it as a mixed model. While the multivariate approach is easy to run and quite intuitive, there are a number of advantages to running a repeated measures analysis as a mixed model.
• Five Extensions of the General Linear Model - Generalized linear models, linear mixed models, generalized linear mixed models, marginal models, GEE models. You’ve probably heard of more than one of them and you’ve probably also heard that each one is an extension of our old friend, the general linear model. This is true, and they extend our old friend in different ways, particularly in regard to the measurement level of the dependent variable and the independence of the measurements. So while the names are similar (and confusing), the distinctions are important.
• GLM in SPSS: Centering a Covariate to Improve Interpretability - The reason for centering a continuous covariate is that it can improve interpretability. For example, say you had one categorical predictor with 4 categories and one continuous covariate, plus an interaction between them. First, you’ll notice that if you center your covariate at the mean, there is
• Have you Wondered how using SPSS Burns Calories? - Number 4: This morning, I received an email listing some interesting facts, among them: "Banging your head against a wall burns 150 calories an hour." I'm pretty sure that one is not specifically about SPSS, but it could be.
• How Simple Should a Model Be? The Case of Insignificant Controls, Interactions, and Covariance Structures - “Everything should be made as simple as possible, but no simpler” – Albert Einstein* For some reason, I’ve heard this quotation 3 times in the past 3 days.  Maybe I hear it everyday, but only noticed because I’ve been working with a few clients on model selection, and deciding how much to simplify a model. […]
• How to Calculate Effect Size Statistics - Luckily, all the effect size measures are relatively easy to calculate from information in the ANOVA table on your output. Here are a few common ones:
• Interpreting (Even Tricky) Regression Coefficients – A Quiz - Here’s a little quiz: True or False? 1. When you add an interaction to a regression model, you can still evaluate the main effects of the terms that make up the interaction, just like in ANOVA. 2. The intercept is usually meaningless in a regression model.
• Interpreting Interactions Between Two Effect-Coded Categorical Predictors - I recently received this great question: Question: Hi Karen,  ive purchased a lot of your material and read a lot of your pdf documents w.r.t. regression and interaction terms.  Its, now, my general understanding that interaction for two or more categorical variables is best done with effects coding, and interactions  cont v. categorical variables is […]
• Interpreting Interactions when Main Effects are Not Significant - If you have significant a significant interaction effect and non-significant main effects, would you interpret the interaction effect? It's a question I get pretty often, and it's a more straightforward answer than most. There is really only one situation possible in which an interaction is significant, but the main effects are not: a cross-over interaction.
• Interpreting Interactions: When the F test and the Simple Effects disagree. - The way to follow up on a significant two-way interaction between two categorical variables is to check the simple effects. Every so often, however, you have a significant interaction, but no significant simple effects. It is not a logical impossibility. They are testing two different, but related hypotheses.
• January 2014 Member Webinar: Interactions in ANOVA and Regression Models, Part 2 - In this follow-up to December’s webinar, we’ll finish up our discussion of interactions. There is something about interactions that is incredibly confusing. An interaction between two predictor variables means that one predictor variable affects a third variable differently at different values of the other predictor.
• January 2015 Member Webinar: ANCOVA (Analysis of Covariance) - Data analysts can get away without ever understanding matrix algebra, certainly. But there are times when having even a basic understanding of how matrix algebra works and what it has to do with data can really make your analyses make a little more sense.
• July 2017 Member Webinar: The Multi-Faceted World of Residuals - Residuals can be a very broad topic - one that most everyone has heard of, but few people truly understand. It’s time to change that. By definition, a “residual” is “the quantity remaining after other things have been subtracted or allowed for.” In statistics, we use the term in a similar fashion. Residuals come in various forms: Standardized Studentized Pearson Deviance But which ones do we use… and why?
• June 2013 Member Webinar: MANOVA - MANOVA is the multivariate (meaning multiple dependent variables) version of ANOVA, but there are many misconceptions about it. In this webinar, you’ll learn: When to use MANOVA and when you’d be better off using individual ANOVAs How to follow up the overall MANOVA results to interpret What those strange statistics mean — Wilk’s lambda, Roy’s Greatest Root (hint — it’s not a carrot) Its relationship to discriminant analysis
• Linear Mixed Models for Missing Data in Pre-Post Studies - In the past few months, I've gotten the same question from a few clients about using linear mixed models for repeated measures data. They want to take advantage of its ability to give unbiased results in the presence of missing data. In each case the study has two groups complete a pre-test and a post-test measure. Both of these have a lot of missing data...
• May 2013 Member Webinar: Using Excel to Graph Predicted Values from Regression Models - Graphing predicted values from a regression model or means from an ANOVA makes interpretation of results much easier. Every statistical software will graph predicted values for you. But the more complicated your model, the harder it can be to get the graph you want in the format you want. Excel isn’t all that useful for estimating the statistics, but it has some very nice features that are useful for doing data analysis, one of which is graphing.
• Model Building Strategies: Step Up and Top Down - How should I build my model? I get this question a lot, and it's difficult to answer at first glance--it depends too much on your particular situation. There are really three parts to the approach to building a model: the strategy, the technique to implement that strategy, and the decision criteria used within the technique.
• New version released of Amelia II: A Program for Missing Data - A new version of Amelia II, a free package for multiple imputation, has just been released today.
• Non-parametric ANOVA in SPSS - I sometimes get asked questions that many people need the answer to. Here's one about non-parametric anova. Question: Is there a non-parametric 3 way ANOVA out there and does SPSS have a way of doing a non-parametric anova sort of thing with one main independent variable and 2 highly influential cofactors?
• One-tailed and two-tailed tests - I was recently asked about when to use one and two tailed tests. The long answer is:  Use one tailed tests when you have a specific hypothesis about the direction of your relationship.  Some examples include you hypothesize that one group mean is larger than the other; you hypothesize that the correlation is positive; you […]
• Problems Caused by Categorizing Continuous Variables - I just came across this great article by Frank Harrell:  Problems Caused by Categorizing Continuous Variables It’s from the Vanderbilt University biostatistics department, so the examples are all medical, but the points hold for any field. It goes right along with my recent post, Continuous and Categorical Variables: The Trouble with Median Splits.
• Six Differences Between Repeated Measures ANOVA and Linear Mixed Models - As mixed models are becoming more widespread, there is a lot of confusion about when to use these more flexible but complicated models and when to use the much simpler and easier-to-understand repeated measures ANOVA. One thing that makes the decision harder is sometimes the results are exactly the same from the two models and sometimes the results are vastly different. In many ways, repeated measures ANOVA is antiquated -- it's never better or more accurate than mixed models. That said, it's a lot simpler. As a general rule, you should use the simplest analysis that gives accurate results and answers the research question. I almost never use repeated measures ANOVA in practice, because it's rare to find an analysis where the flexibility of mixed models isn't an advantage. But they do exist. Here are some guidelines on similarities and differences:
• Specifying Fixed and Random Factors in Mixed Models - Since SAS introduced Proc Mixed about fifteen years ago, S-Plus, Stata and SPSS have implemented procedures to analyze mixed models, greatly broadening the options available to researchers. These programs require correctly specifying the fixed and random factors of the model to obtain accurate analyses. The definitions in many texts often do not help with decisions […]
• Specifying Variables as Within-Subjects Factors in Repeated Measures - I want to do a GLM (repeated measures ANOVA) with the valence of some actions of my test-subjects (valence = desirability of actions) as a within-subject factor. My subjects have to rate a number of actions/behaviours in a pre-set list of 20 actions from ‘very likely to do’ to ‘will never do this’ on a scale from 1 to 7,..
• Spotlight Analysis for Interpreting Interactions - Not too long ago, a client asked for help with using Spotlight Analysis to interpret an interaction in a regression model. Spotlight Analysis? I had never heard of it. As it turns out, it’s a (snazzy) new name for an old way of interpreting an interaction between a continuous and a categorical grouping variable in a regression model...
• SPSS GLM: Choosing Fixed Factors and Covariates - The beauty of the Univariate GLM procedure in SPSS is that it is so flexible.  You can use it to analyze regressions, ANOVAs, ANCOVAs with all sorts of interactions, dummy coding, etc. The down side of this flexibility is it is often confusing what to put where and what it all means. So here’s a […]
• Steps to Take When Your Regression (or Other Statistical) Results Just Look…Wrong - You’ve probably experienced this before. You’ve done a statistical analysis, you’ve figured out all the steps, you finally get results and are able to interpret them. But they just look…wrong. Backwards, or even impossible—theoretically or logically. This happened a few times recently to a couple of my consulting clients, and once to me. So I […]
• Testing and Dropping Interaction Terms in Regression and ANOVA models - In an ANOVA or regression model, should you drop interaction terms if they're not significant? As with everything in statistics, it depends.
• The 13 Steps for Statistical Modeling in any Regression or ANOVA - No matter what statistical model you’re running, you need to go through the same 13 steps.  The order and the specifics of how you do each step will differ depending on the data and the type of model you use. These 13 steps are in 3 major parts.  Most people think of only Part 3 […]
• The 3 Stages of Mastering Statistical Analysis - Like any applied skill, mastering statistical analysis requires: 1. building a body of knowledge 2. adeptness of the tools of the trade (aka software package) 3. practice applying the knowledge and using the tools in a realistic, meaningful context.
• The Assumptions of Linear Models: Explicit and Implicit - These assumptions are explicitly stated by the model: 1.The residuals are independent 2.The residuals are normally distributed 3.The residuals have a mean of 0 at all values of X 4.The residuals have constant variance
• The Difference Between Clustered, Longitudinal, and Repeated Measures Data - In repeated measures data, the dependent variable is measured more than once for each subject. Usually, there is some independent variable (often called a within-subject factor) that changes with each measurement. And in longitudinal data, the dependent variable is measured at several time points for each subject, often over a relatively long period of time.
• The Difference Between Crossed and Nested Factors - One of those tricky, but necessary, concepts in statistics is the difference between crossed and nested factors. As a reminder, a factor is just any categorical independent variable. In experiments, or any randomized designs, these factors are often manipulated. Experimental manipulations (like Treatment vs. Control) are factors. Observational categorical predictors, such as gender, time point, […]
• The Difference Between Eta Squared and Partial Eta Squared - For ANOVAs, two of the most popular are Eta-squared and partial Eta-squared. In one way ANOVAs, they come out the same, but in more complicated models, their values, and their meanings differ.him
• The Difference Between Interaction and Association - Interaction is different. Whether two variables are associated says nothing about whether they interact in their effect on a third variable. Likewise, if two variables interact, they may or may not be associated.
• The General Linear Model, Analysis of Covariance, and How ANOVA and Linear Regression Really are the Same Model Wearing Different Clothes - But that's really just one application of a linear model with one categorical and one continuous predictor. The research question of interest doesn't have to be about the categorical predictor, and the covariate doesn't have to be a nuisance variable. A regression model with one continuous and one dummy variable is the same model (actually, you'd need two dummy variables to cover the three categories, but that's another story).
• The Problem with Using Tests for Statistical Assumptions - Every statistical model and hypothesis test has assumptions. And yes, if you’re going to use a statistical test, you need to check whether those assumptions are reasonable to whatever extent you can. Some assumptions are easier to check than others. Some are so obviously reasonable that you don’t need to do much to check them […]
• The Repeated and Random Statements in Mixed Models for Repeated Measures - Here's one example of the flexibility of mixed models, and its resulting potential for confusion and error. In repeated measures and longitudinal studies, the observations are clustered within a subject. That means the observations, and their residuals, are not independent. They're correlated. There are two ways to deal with this correlation.
• The Wide and Long Data Format for Repeated Measures Data - In many repeated measures data situations, you will need to set up the data different ways for different parts of the analyses. This article will outline one of the issues in data set up: using the long vs. the wide data format.
• Using Adjusted Means to Interpret Moderators in Analysis of Covariance - If you use the menus in SPSS, you can only get those EMMeans at the Covariate's mean, which in this example is about 25, where the vertical black line is. This isn't very useful for our purposes. But we can change the value of the covariate at which to compare the means using syntax.
• What Is Regression to the Mean? - Have you ever heard that “2 tall parents will have shorter children”? This phenomenon, known as regression to the mean, has been used to explain everything from patterns in hereditary stature (as Galton first did in 1886) to why movie sequels or sophomore albums so often flop. So just what is regression to the mean (RTM)?
• What’s in a Name? Moderation and Interaction, Independent and Predictor Variables - When we talk about moderation, though, there is a specific role to X and Z. One is assigned as the Independent Variable and the other as the Moderator. The Independent Variable is an independent variable based on the third implication listed above: its effect is of primary interest.
• When Assumptions of ANCOVA are Irrelevant - Every once in a while, I work with a client who is stuck between a particular statistical rock and hard place. It happens when they're trying to run an analysis of covariance (ANCOVA) model because they have a categorical independent variables and a continuous covariate. The problem arises when a coauthor, committee member, or reviewer insists that ANCOVA is inappropriate in this situation because one of the following ANCOVA assumptions are not met: (1) The independent variable and the covariate are independent of each other (2) There is no interaction between independent variable and the covariate.
• When Dependent Variables Are Not Fit for Linear Models, Now What? - When your dependent variable is not continuous, unbounded, and measured on an interval or ratio scale, your model will never meet the Assumptions of the General Linear Model (GLM).  Today I’m going to go into more detail about these 6 common types of dependent variables, and the tests that work instead. Categorical Variables, including both […]
• When Does Repeated Measures ANOVA not work for Repeated Measures Data? - Repeated measures ANOVA is the approach most of us learned in stats classes, and it works very well in certain designs. But it’s a bit limited in what it can do. Sometimes trying to fit a data set into a repeated measures ANOVA requires too much data gymnastics—averaging across repetitions or pretending a continuous predictor isn’t really.
• When Dummy Codes are Backwards, Your Stat Software may be Messing With You - In SAS proc glm, when you specify a predictor as categorical in the CLASS statement, it will automatically dummy code it for you in the parameter estimates table (the regression coefficients). The default reference category--what GLM will code as 0--is the highest value. This works just fine if your values are coded 1, 2, and 3. But if you've dummy coded them already, it's switching them on you.
• When to Check Model Assumptions - If any of these fail, it’s nearly impossible to get normally distributed residuals, even with remedial transformations. Types of variables that will generally fail these criteria include: Categorical Variables, both nominal and ordinal. Count Variables, which are often distributed as Poisson or Negative Binomial.
• When to leave insignificant effects in a model - You may have noticed conflicting advice about whether to leave insignificant effects in a model or take them out in order to simplify the model. One effect of leaving in insignificant predictors is on p-values–they use up precious df in small samples. But if your sample isn’t small, the effect is negligible. The bigger effect […]
• When Unequal Sample Sizes Are and Are NOT a Problem in ANOVA - Few data sets are completely balanced, with equal sample sizes in every condition. But are they really the scary problem your stats professor made them out to be? Only sometimes.
• Why ANOVA and Linear Regression are the Same Analysis - ANOVA and Linear Regression are not only related, they're the same thing. Not a quarter and a nickel--different sides of the same coin. This article shows why.
• Why ANOVA is Really a Linear Regression, Despite the Difference in Notation - When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.”  But they never explained why. And I couldn’t figure it out. The model notation is different. The output looks different. The vocabulary is different. The focus of what we’re testing is completely different. How can they […]
• Why report estimated marginal means in SPSS GLM? - The Estimated Marginal Means in SPSS GLM are the means of each factor or interaction you specify, adjusted for any other variables in the model.