In many research fields, particularly those that mostly use ANOVA, a common practice is to categorize continuous predictor variables so they work in an ANOVA. This is often done with median splits—splitting the sample into two categories—the “high” values above the median and the “low” values below the median. There are many reasons why this isn’t such a good idea:
- the median varies from sample to sample, making the categories in different samples have different meanings
- all values on one side of the median are considered equivalent—any variation within the category is ignored, and two values right next to each other on either side of the median are considered different
- the categorization is completely arbitrary. A ‘High” score isn’t necessarily high. If the scale is skewed, as many are, even a value near the low end can end up in the “high” category.
But it can be very useful and legitimate to be able to choose whether to treat an independent variable as categorical or continuous. Knowing when it is appropriate [Read more…] about 3 Situations when it makes sense to Categorize a Continuous Predictor in a Regression Model