• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • our programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • statistical resources
  • blog
  • about
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • contact
  • login

predictors

What Is Specification Error in Statistical Models?

by Karen Grace-Martin  Leave a Comment

When we think about model assumptions, we tend to focus on assumptions like independence, normality, and constant variance. The other big assumption, which is harder to see or test, is that there is no specification error. The assumption of linearity is part of this, but it’s actually a bigger assumption.

What is this assumption of no specification error? [Read more…] about What Is Specification Error in Statistical Models?

Tagged With: curvilinear effect, interaction, Model Building, predictors, specification error, statistical model, transformation

Related Posts

  • Member Training: Model Building Approaches
  • Differences in Model Building Between Explanatory and Predictive Models
  • Overfitting in Regression Models
  • What It Really Means to Remove an Interaction From a Model

Differences in Model Building Between Explanatory and Predictive Models

by Jeff Meyer  8 Comments

by Jeff Meyer, MPA, MBAStage 2

Suppose you are asked to create a model that will predict who will drop out of a program your organization offers. You decide to use a binary logistic regression because your outcome has two values: “0” for not dropping out and “1” for dropping out.

Most of us were trained in building models for the purpose of understanding and explaining the relationships between an outcome and a set of predictors. But model building works differently for purely predictive models. Where do we go from here? [Read more…] about Differences in Model Building Between Explanatory and Predictive Models

Tagged With: explanatory models, Model Building, overfitting, predictive models, predictors, significance testing, Training Data, validation data

Related Posts

  • Overfitting in Regression Models
  • What It Really Means to Remove an Interaction From a Model
  • Simplifying a Categorical Predictor in Regression Models
  • Descriptives Before Model Building

Member Training: Quantile Regression: Going Beyond the Mean

by guest contributer  Leave a Comment

In your typical statistical work, chances are you have already used quantiles such as the median, 25th or 75th percentiles as descriptive statistics.

But did you know quantiles are also valuable in regression, where they can answer a broader set of research questions than standard linear regression?

In standard linear regression, the focus is on estimating the mean of a response variable given a set of predictor variables.

In quantile regression, we can go beyond the mean of the response variable. Instead we can understand how predictor variables predict (1) the entire distribution of the response variable or (2) one or more relevant features (e.g., center, spread, shape) of this distribution.

For example, quantile regression can help us understand not only how age predicts the mean or median income, but also how age predicts the 75th or 25th percentile of the income distribution.

Or we can see how the inter-quartile range — the width between the 75th and 25th percentile — is affected by age. Perhaps the range becomes wider as age increases, signaling that an increase in age is associated with an increase in income variability.

In this webinar, we will help you become familiar with the power and versatility of quantile regression by discussing topics such as:

  • Quantiles – a brief review of their computation, interpretation and uses;
  • Distinction between conditional and unconditional quantiles;
  • Formulation and estimation of conditional quantile regression models;
  • Interpretation of results produced by conditional quantile regression models;
  • Graphical displays for visualizing the results of conditional quantile regression models;
  • Inference and prediction for conditional quantile regression models;
  • Software options for fitting quantile regression models.

Join us on this webinar to understand how quantile regression can be used to expand the scope of research questions you can address with your data.


Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

[Read more…] about Member Training: Quantile Regression: Going Beyond the Mean

Tagged With: distribution, linear regression, percentile, predictor variable, predictors, quantile regression, quantiles, Regression

Related Posts

  • What is Multicollinearity? A Visual Description
  • Member Training: Mediated Moderation and Moderated Mediation
  • Member Training: The Link Between ANOVA and Regression
  • Member Training: Centering

Analyzing Zero-Truncated Count Data: Length of Stay in the ICU for Flu Victims

by Jeff Meyer  3 Comments

by Jeff Meyer

It’s that time of year: flu season.

Let’s imagine you have been asked to determine the factors that will help a hospital determine the length of stay in the intensive care unit (ICU) once a patient is admitted.

The hospital tells you that once the patient is admitted to the ICU, he or she has a day count of one. As soon as they spend 24 hours plus 1 minute, they have stayed an additional day.

Clearly this is count data. There are no fractions, only whole numbers.

To help us explore this analysis, let’s look at real data from the State of Illinois. We know the patients’ ages, gender, race and type of hospital (state vs. private).

A partial frequency distribution looks like this: [Read more…] about Analyzing Zero-Truncated Count Data: Length of Stay in the ICU for Flu Victims

Tagged With: Count data, linear regression, negative binomial, poisson, predictors, Truncated

Related Posts

  • The Problem with Linear Regression for Count Data
  • The Importance of Including an Exposure Variable in Count Models
  • Poisson or Negative Binomial? Using Count Model Diagnostics to Select a Model
  • Getting Accurate Predicted Counts When There Are No Zeros in the Data

Introduction to Logistic Regression

by Karen Grace-Martin  1 Comment

Researchers are often interested in setting up a model to analyze the relationship between some predictors (i.e., independent variables) and a response (i.e., dependent variable). Linear regression is commonly used when the response variable is continuous.  One assumption of linear models is that the residual errors follow a normal distribution. This assumption fails when the response variable is categorical, so an ordinary linear model is not appropriate. This article presents a regression model for a response variable that is dichotomous–having two categories. Examples are common: whether a plant lives or dies, whether a survey respondent agrees or disagrees with a statement, or whether an at-risk child graduates or drops out from high school.

In ordinary linear regression, the response variable (Y) is a linear function of the coefficients (B0, B1, etc.) that correspond to the predictor variables (X1, X2, etc.). A typical model would look like:

Y = B0 + B1*X1 + B2*X2 + B3*X3 + … + E

For a dichotomous response variable, we could set up a similar linear model to predict individuals’ category memberships if numerical values are used to represent the two categories. Arbitrary values of 1 and 0 are chosen for mathematical convenience. Using the first example, we would assign Y = 1 if a plant lives and Y = 0 if a plant dies.

This linear model does not work well for a few reasons. First, the response values, 0 and 1, are arbitrary, so modeling the actual values of Y is not exactly of interest. Second, it is really the probability that each individual in the population responds with 0 or 1 that we are interested in modeling. For example, we may find that plants with a high level of a fungal infection (X1) fall into the category “the plant lives” (Y) less often than those plants with low level of infection. Thus, as the level of infection rises, the probability of a plant living decreases.

Thus, we might consider modeling P, the probability, as the response variable. Again, there are problems. Although the general decrease in probability is accompanied by a general increase in infection level, we know that P, like all probabilities, can only fall within the boundaries of 0 and 1. Consequently, it is better to assume that the relationship between X1 and P is sigmoidal (S-shaped), rather than a straight line.

It is possible, however, to find a linear relationship between X1 and a function of P. Although a number of functions work, one of the most useful is the logit function. It is the natural log of the odds that Y is equal to 1, which is simply the ratio of the probability that Y is 1 divided by the probability that Y is 0. The relationship between the logit of P and P itself is sigmoidal in shape. The regression equation that results is:

ln[P/(1-P)] = B0 + B1*X1 + B2*X2 + …

Although the left side of this equation looks intimidating, this way of expressing the probability results in the right side of the equation being linear and looking familiar to us. This helps us understand the meaning of the regression coefficients. The coefficients can easily be transformed so that their interpretation makes sense.

The logistic regression equation can be extended beyond the case of a dichotomous response variable to the cases of ordered categories and polytymous categories (more than two categories).


Bookmark and Share

Tagged With: binary variable, dichotomous response, log-odds, logistic regression, ordered categories, polytymous categories, predictors, sigmoidal relationship

Related Posts

  • Why Logistic Regression for Binary Response?
  • When Linear Models Don’t Fit Your Data, Now What?
  • Member Training: Explaining Logistic Regression Results to Non-Researchers
  • How to Decide Between Multinomial and Ordinal Logistic Regression Models

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: Moderated Mediation, Not Mediated Moderation

Upcoming Workshops

    No Events

Upcoming Free Webinars

TBA

Quick links

Our Programs Statistical Resources Blog/News About Contact Log in

Contact

Upcoming

Free Webinars Membership Trainings Workshops

Privacy Policy

Search

Copyright © 2008–2023 The Analysis Factor, LLC.
All rights reserved.

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT