Linear Regression


online workshops

Linear Models

Most of the statistical analyses you need to do as a researcher are most likely linear regressions or ANOVAs. Or some extension of them. But how well do you really understand linear models? learn more

Interpreting (Even Tricky) Regression Coefficients

The statistics classes you’ve had so far have probably focused on straightforward models with only continuous predictors. But real research, unfortunately, isn’t usually that simple. At some point, you’re going to need to build a more complex model. learn more


the craft of statistical analysis free webinars

Interpreting Linear Regression Coefficients

There are many coefficients in linear regression models that are difficult to interpret — interactions, categorical predictors, centered predictors. Put them together into one model and it’s even harder! learn more

Four Critical Steps in Building Linear Regression Models

A primary consideration in model building is which variables to include in the model. A secondary one is deciding which predictors to retain in the model. But the decisions don’t stop there. A number of other considerations can make model building either very straightforward or extremely frustrating. learn more


statistically speaking member trainings

Multicollinearity

Multicollinearity isn’t an assumption of regression models; it’s a data issue. And while it can be seriously problematic, more often it’s just a nuisance. learn more

Hierarchical Regression

Hierarchical regression is a very common approach to model building that allows you to see the incremental contribution to a model of sets of predictor variables. Popular for linear regression in many fields, the approach can be used in any type of regression model — logistic regression, linear mixed models, or even ANOVA. learn more

Using Excel to Graph Predicted
Values from Regression Models

Graphing predicted values from a regression model or means from an ANOVA makes interpretation of results much easier. Every statistical software will graph predicted values for you. But the more complicated your model, the harder it can be to get the graph you want in the format you want. learn more

ANCOVA (Analysis of Covariance)

Analysis of Covariance (ANCOVA) is a type of linear model that combines the best abilities of linear regression with the best of Analysis of Variance. It allows you to test differences in group means and interactions, just like ANOVA, while covarying out the effect of a continuous covariate. learn more

Dummy and Effect Coding

Why does ANOVA give main effects in the presence of interactions, but Regression gives marginal effects? What are the advantages and disadvantages of dummy coding and effect coding? When does it make sense to use one or the other? How does each one work, really? learn more

Transformations & Nonlinear Effects in Linear Models

Why is it we can model non-linear effects in linear regression? What the heck does it mean for a model to be “linear in the parameters?” We explore a number of ways of using a linear regression to model a non-linear effect between X and Y. learn more

The Multi-Faceted World of Residuals

Most analysts’ primary focus is to check the distributional assumptions with regards to residuals. They must be independent and identically distributed (i.i.d.) with a mean of zero and constant variance. Residuals can also give us insight into the quality of our models. learn more

Using Transformations to Improve Your
Linear Regression Model

Transformations don’t always help, but when they do, they can improve your linear regression model in several ways simultaneously. They can help you better meet the linear regression assumptions of normality and homoscedascity (i.e., equal variances). They also can help avoid some of the artifacts caused by boundary limits in your dependent variable — and sometimes even remove a difficult-to-interpret interaction. learn more

Marginal Means, Your New Best Friend

Interpreting regression coefficients can be tricky, especially when the model has interactions or categorical predictors (or worse – both). But there is a secret weapon that can help you make sense of your regression results: marginal means. learn more

Segmented Regression

Linear regression with a continuous predictor is set up to measure the constant relationship between that predictor and a continuous outcome. This relationship is measured in the expected change in the outcome for each one-unit change in the predictor. learn more

Quantile Regression: Going
Beyond the Mean

In your typical statistical work, chances are you have already used quantiles such as the median, 25th or 75th percentiles as descriptive statistics. But did you know quantiles are also valuable in regression, where they can answer a broader set of research questions than standard linear regression? learn more


articles at the analysis factor

Should I Specify a Model Predictor as
Categorical or Continuous?

Predictor variables in statistical models can be treated as either continuous or categorical. Usually, this is a very straightforward decision. But there are numerical predictors that aren’t continuous. And these can sometimes make sense to treat as continuous and sometimes make sense as categorical. learn more

What Is Specification Error in
Statistical Models?

When we think about model assumptions, we tend to focus on assumptions like independence, normality, and constant variance. The other big assumption, which is harder to see or test, is that there is no specification error. The assumption of linearity is part of this, but it’s actually a bigger assumption. learn more

Steps to Take When Your Regression (or
Other) Results Just Look… Wrong

You’ve probably experienced this before. You’ve done a statistical analysis, you’ve figured out all the steps, you finally get results and are able to interpret them. But they just look…wrong. Backwards, or even impossible—theoretically or logically. learn more

Understanding Interactions Between Categorical
and Continuous Variables in Linear Regression

We’ve looked at the interaction effect between two categorical variables. But what if our predictors of interest, say, are a categorical and a continuous variable? How do we interpret the interaction between the two? learn more

The Distribution of Independent Variables
in Regression Models

While there are a number of distributional assumptions in regression models, one distribution that has no assumptions is that of any predictor (i.e. independent) variables. It’s because regression models are directional. In a correlation, there is no direction–Y and X are interchangeable. If you switched them, you’d get the same correlation coefficient. learn more

Differences in Model Building Between
Explanatory and Predictive Models

Most of us were trained in building models for the purpose of understanding and explaining the relationships between an outcome and a set of predictors. But model building works differently for purely predictive models. learn more

Why ANOVA is Really a Linear Regression,
Despite the Difference in Notation

When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.” But they never explained why. And I couldn’t figure it out. The model notation is different. The output looks different. The vocabulary is different. The focus of what we’re testing is completely different. How can they be the same model? learn more

The Impact of Removing the Constant from a
Regression Model: The Categorical Case

In a simple linear regression model, how the constant (a.k.a., intercept) is interpreted depends upon the type of predictor (independent) variable. If the predictor is categorical and dummy-coded, the constant is the mean value of the outcome variable for the reference category only. If the predictor variable is continuous, the constant equals the predicted value of the outcome variable when the predictor variable equals zero. learn more

When to Leave Insignificant Effects in a Model

You may have noticed conflicting advice about whether to leave insignificant effects in a model or take them out in order to simplify the model. One effect of leaving in insignificant predictors is on p-values–they use up precious df in small samples. But if your sample isn’t small, the effect is negligible. learn more

Model Building Strategies: Step Up and Top Down

How should I build my model? I get this question a lot, and it’s difficult to answer at first glance–it depends too much on your particular situation. There are really three parts to the approach to building a model: the strategy, the technique to implement that strategy, and the decision criteria used within the technique. learn more

Five Common Relationships Among Three
Variables in a Statistical Model

In a statistical model–any statistical model–there is generally one way that a predictor X and a response Y can relate. This relationship can take on different forms, of course, like a line or a curve, but there’s really only one relationship here to measure. Usually the point is to model the predictive ability, the effect, of X on Y. learn more

Can a Regression Model with a Small
R-squared Be Useful?

R² is such a lovely statistic, isn’t it? Unlike so many of the others, it makes sense–the percentage of variance in Y accounted for by a model. I mean, you can actually understand that. So can your grandmother. And the clinical audience you’re writing the report for. A big R² is always good and a small one is always bad, right? Well, maybe. learn more

Confusing Statistical Terms #5: Covariate

Covariate is a tricky term in a different way than hierarchical or beta, which have completely different meanings in different contexts. Covariate really has only one meaning, but it gets tricky because the meaning has different implications in different situations, and people use it in slightly different ways. And these different ways of using the term have BIG implications for what your model means. learn more

Making Dummy Codes Easy to Keep Track of

Here’s a little tip. When you construct Dummy Variables, make it easy on yourself to remember which code is which. Heck, if you want to be really nice, make it easy for anyone else who will analyze the data or read the results. learn more

3 Situations When it Makes Sense to Categorize a
Continuous Predictor in a Regression Model

In many research fields, particularly those that mostly use ANOVA, a common practice is to categorize continuous predictor variables so they work in an ANOVA. This is often done with median splits—splitting the sample into two categories—the “high” values above the median and the “low” values below the median. There are many reasons why this isn’t such a good idea. learn more

Likert Scale Items as Predictor
Variables in Regression

I was recently asked about whether it’s okay to treat a likert scale as continuous as a predictor in a regression model. Here’s my reply. In the question, the researcher asked about logistic regression, but the same answer applies to all regression models. learn more

Why ANOVA and Linear Regression
Are the Same Analysis

If your graduate statistical training was anything like mine, you learned ANOVA in one class and Linear Regression in another. My professors would often say things like “ANOVA is just a special case of Regression,” but give vague answers when pressed. It was not until I started consulting that I realized how closely related ANOVA and regression are. They’re not only related, they’re the same thing. Not a quarter and a nickel–different sides of the same coin. learn more

Measures of Model Fit for Linear Regression Models

A well-fitting regression model results in predicted values close to the observed data values. The mean model, which uses the mean for every predicted value, generally would be used if there were no useful predictor variables. The fit of a proposed regression model should therefore be better than the fit of the mean model. learn more

Understanding Interaction Between Dummy Coded
Categorical Variables in Linear Regression

The concept of a statistical interaction is one of those things that seems very abstract. Obtuse definitions don’t help. But statistical interaction isn’t so bad once you really get it. learn more


stata

Incorporating Graphs in Regression Diagnostics with Stata

You put a lot of work into preparing and cleaning your data. Running the model is the moment of excitement. You look at your tables and interpret the results. But first you remember that one or more variables had a few outliers. Did these outliers impact your results? learn more

Linear Regression in Stata: Missing Data and the Stories they Might Tell

In a previous post, we examined how to use the same sample when comparing regression models. Using different samples in our models could lead to erroneous conclusions when interpreting results. But excluding observations can also result in inaccurate results. learn more

Using the Same Sample for Different Models in Stata

In a recent article, I presented a table which examined the impact several predictors have on one’ mental health. At the bottom of the table is the number of observations (N) contained within each sample. The sample sizes are quite large. Does it really matter that they are different? The answer is absolutely yes. Fortunately in Stata it is not a difficult process to use the same sample for all four models shown. learn more

Hierarchical Regression in Stata: An Easy Method to Compare Model Results

An “estimation command” in Stata is a generic term used for a command that runs a statistical model. Examples are regress, ANOVA, Poisson, logit, and mixed. Stata has more than 100 estimation commands. Creating the “best” model requires trying alternative models. There are a number of different model building approaches, but regardless of the strategy you take, you’re going to need to compare them. learn more


r

Linear Models in R: Diagnosing Our Regression Model

Last time we created two variables and added a best-fit regression line to our plot of the variables. Today we learn how to obtain useful diagnostic information about a regression model and then how to draw residuals on a plot. learn more

Linear Models in R: Plotting Regression Lines

Today let’s re-create two variables and see how to plot them and include a regression line. We take height to be a variable that describes the heights (in cm) of ten people. learn more

R Is Not So Hard! A Tutorial, Part 4: Fitting a Quadratic Model

In Part 4 we will look at more advanced aspects of regression models and see what R has to offer. One way of checking for non-linearity in your data is to fit a polynomial model and check whether the polynomial model fits the data better than a linear model. However, you may also wish to fit a quadratic or higher model because you have reason to believe that the relationship between the variables is inherently polynomial in nature. learn more

R Is Not So Hard! A Tutorial, Part 5: Fitting an Exponential Model

In Parts 3 and 4, we saw how to check for non-linearity in our data by fitting polynomial models and checking whether they fit the data better than a linear model. Now let’s see how to fit an exponential model in R. As before, we will use a data set of counts (atomic disintegration events that take place within a radiation source), taken with a Geiger counter at a nuclear plant. learn more


spss

Order Affects Regression Parameter Estimates in SPSS GLM

When you have an interaction in the model, the order you put terms into the Model statement affects which parameters SPSS gives you. The default in SPSS is to automatically create interaction terms among all the categorical predictors. But if you want fewer than all those interactions, or if you want to put in an interaction involving a continuous variable, you need to choose Model–>Custom Model. learn more