by Steve Simon, PhD

The Cox regression model has a fairly minimal set of assumptions, but how do you check those assumptions and what happens if those assumptions are not satisfied?

Non-proportional hazards

The proportional hazards assumption is so important to Cox regression that we often include it in the name (the Cox proportional hazards model). What it essentially means is that the ratio of the hazards for any two individuals is constant over time. They’re proportional. It involves logarithms and it’s a strange concept, so in this article, we’re going to show you how to tell if you don’t have it.

There are several graphical methods for spotting this violation, but the simplest is an examination of the Kaplan-Meier curves.

If the curves cross, as shown below, then you have a problem.

Figure 1. Graph of crossing survival curves

Likewise, if one curve levels off while the other drops to zero, you have a problem.

Figure 2. Kaplan-Meier curve with only one curve leveling off

You can think of non-proportional hazards as an interaction of your independent variable with time. It means that you have to do more work in interpreting your model. If you ignore this problem, you may also experience a serious loss in power.

If you have evidence of non-proportional hazards, don’t despair. There are several fairly simple modifications to the Cox regression model that will work for you.

Nonlinear covariate relationships

The Cox model assumes that each variable makes a linear contribution to the model, but sometimes the relationship may be more complex.

You can diagnose this problem graphically using residual plots. The residual in a Cox regression model is not as simple to compute as the residual in linear regression, but you look for the same sort of pattern as in linear regression.

If you have a nonlinear relationship, you have several options that parallel your choices in a linear regression model.

Lack of independence

Lack of independence is not something that you have to wait to diagnose until your data is collected. Often it is something you are aware from the start because certain features of the design, such as centers in a multi-center study, are likely to produce correlated outcomes. These are the same issues that hound you with a linear regression model in a multi-center study.

There are several ways to account for lack of independence, but this is one problem you don’t want to ignore. An invalid model will ruin all your confidence intervals and p-values.


Parametric or Semi-Parametric Models in Survival Analysis?

Parametric models for survival data don’t work well with the normal distribution. The distributions that work well for survival data include the exponential, Weibull, gamma, and lognormal distributions among others. These distributions give you a broad range of hazard functions…

Read the full article →

Six Types of Survival Analysis and Challenges in Learning Them

Survival analysis isn’t just a single model.

It’s a whole set of tests, graphs, and models that are all used in slightly different data and study design situations. Choosing the most appropriate model can be challenging.

In this article I will describe the most common types of tests and models in survival analysis, how they differ, and some challenges to learning them.

Read the full article →

August 2018 Member Webinar: Power Analysis and Sample Size Determination Using Simulation

In this webinar you will learn what these variables are, introduce the relationships between the Poisson, Bernoulli, Binomial, and Normal distributions, and see an example of how to actually set up the data and specify and interpret the logistic model for these kinds of variables.

Read the full article →

What is Survival Analysis and When Can It Be Used?

There are two features of survival models.

First is the process of measuring the time in a sample of people, animals, or machines until a specific event occurs. In fact, many people use the term “time to event analysis” or “event history analysis” instead of “survival analysis” to emphasize the broad range of areas where you can apply these techniques.

Read the full article →

The Problem with Using Tests for Statistical Assumptions

Every statistical model and hypothesis test has assumptions. And yes, if you’re going to use a statistical test, you need to check whether those assumptions are reasonable to whatever extent you can. Some assumptions are easier to check than others. Some are so obviously reasonable that you don’t need to do much to check them […]

Read the full article →

Using Marginal Means to Explain an Interaction to a Non-Statistical Audience

You show this table in your PowerPoint presentation because you know your audience is expecting some statistics, though they don’t really understand them. You begin by explaining that the constant (_cons) represents the mean BMI of small frame women. You have now lost half of your audience because they have no idea why the constant represents small frame women.

By the time you start explaining the interaction you have lost 95% of your audience.

Read the full article →

July 2018 Member Webinar: Logistic Regression for Count and Proportion Data

In this webinar you will learn what these variables are, introduce the relationships between the Poisson, Bernoulli, Binomial, and Normal distributions, and see an example of how to actually set up the data and specify and interpret the logistic model for these kinds of variables.

Read the full article →

Life After Exploratory Factor Analysis: Estimating Internal Consistency

by Christos Giannoulis, PhD After you are done with the odyssey of exploratory factor analysis (aka a reliable and valid instrument)…you may find yourself at the beginning of a journey rather than the ending. The process of performing exploratory factor analysis usually seeks to answer whether a given set of items form a coherent factor […]

Read the full article →

Confirmatory Factor Analysis: How To Measure Something We Cannot Observe or Measure Directly

Anytime we want to measure something in science we have to take into account that our measurements contains various kinds of error. That error can be random and/or systematic. So what we want to do in our statistical approach to the data is to isolate the true score in a variable and remove the error. This is really what we’re trying to do using latent variables for measurement.

Read the full article →