by Jeff Meyer

In a simple linear regression model, how the constant (a.k.a., intercept) is interpreted depends upon the type of predictor (independent) variable.

If the predictor is categorical and dummy-coded, the constant is the mean value of the outcome variable for the reference category only. If the predictor variable is continuous, the constant equals the predicted value of the outcome variable when the predictor variable equals zero.

Removing the Constant When the Predictor Is Categorical

When your predictor variable X is categorical, the results are logical. Let’s look at an example. [click to continue…]


Count Models: Understanding the Log Link Function

When we perform a statistical model, we are in a sense creating a mathematical equation. We have two parts to the equation. The left side of the equation is the sum of that fixed component and the random component. The random component has a probability distribution. Since the outcome variable includes that random component, it too follows a probability distribution. On the right side of the equation is a link function, which is the link between the mean of Y and the structural component. It’s very possible you have run models without being aware of this. Some software packages have predictor models (e.g., Stata’s Poisson and nbreg) which use a default link function. But if you run a generalized linear model (GLM), then you must select the link function that fits your random components.

Read the full article →

December 2016 Webinar: A Gentle Introduction to Generalized Linear Mixed Models – Part 2

Generalized linear mixed models (GLMMs) are incredibly useful tools for working with complex, multi-layered data. But they can be tough to master. In this follow-up to October’s webinar (“A Gentle Introduction to Generalized Linear Mixed Models – Part 1”), you’ll learn the major issues involved in working with GLMMs and how to incorporate these models into your own work.

Read the full article →

The Difference Between Truncated and Censored Data

There are two types of bounded data that have direct implications for how to work with them in analysis: censored and truncated data. Understanding the difference is a critical first step when undertaking a statistical analysis.

Read the full article →

Principal Component Analysis for Ordinal Scale Items

Principal Component Analysis is really, really useful. You use it to create a single index variable from a set of correlated variables. In fact, the very first step in Principal Component Analysis is to create a correlation matrix (a.k.a., a table of bivariate correlations). The rest of the analysis is based on this correlation matrix. You don’t usually see this step — it happens behind the scenes in your software. Most PCA procedures calculate that first step using only one type of correlations: Pearson. And that can be a problem. Pearson correlations assume all variables are normally distributed. That means they have to be truly quantitative, symmetric, and bell shaped. And unfortunately, many of the variables that we need PCA for aren’t. Likert Scale items are a big one.

Read the full article →

Outliers and Their Origins

Outliers are one of those realities of data analysis that no one can avoid. Those pesky extreme values cause biased parameter estimates, non-normality in otherwise beautifully normal variables, and inflated variances. Everyone agrees that outliers cause trouble with parametric analyses. But not everyone agrees that they’re always a problem, or what to do about them even if they are. Sometimes a nonparametric or robust alternative is available–and sometimes not. There are a number of approaches in statistical analysis for dealing with outliers and the problems they create.

Read the full article →

November 2016 Webinar: The LASSO Regression Model

The LASSO model (Least Absolute Shrinkage and Selection Operator) is a recent development that allows you to find a good fitting model in the regression context. It avoids many of the problems of overfitting that plague other model-building approaches. In this month’s Statistically Speaking webinar, guest instructor Steve Simon, PhD, will explain what overfitting is — and why it’s a problem.

Read the full article →

Two Types of Effect Size Statistic: Standardized and Unstandardized

Effect size statistics are all the rage these days. Journal editors are demanding them. Committees won’t pass dissertations without them. But the reason to compute them is not just that someone wants them — they can truly help you understand your data analysis.

Read the full article →

October 2016 Membership Webinar: Generalized Linear Mixed Models

In this webinar, we’ll provide a gentle introduction to generalized linear mixed models (or GLMMs). You’ll become familiar with the major issues involved in working with GLMMs so you can more easily transition to using these models in your work.

Read the full article →

Creating Graphs in Stata: From Percentiles to Observe Trends (Part 2)

In a previous post we discussed the difficulties of spotting meaningful information when we work with a large panel data set. Observing the data collapsed into groups, such as quartiles or deciles, is one approach to tackling this challenging task. We showed how this can be easily done in Stata using just 10 lines of code..

Read the full article →