I’ve talked a bit about the arbitrary nature of median splits and all the information they just throw away.
But I have found that as a data analyst, it is incredibly freeing to be able to choose whether to make a variable continuous or categorical and to make the switch easily. Essentially, this means you need to be comfortable with both ANOVA and regression approaches, particularly with dummy coding.
It’s true that median splits are arbitrary, but there are situations when it is reasonable to make a continuous numerical variable categorical.
One of these situations is when the continuous variable has meaningful cut points.
Many scales are designed and tested to have specific cut points. For example, a depression scale with scores from 1-20 will have a score at which a person is considered clinically depressed, mildly depressed, and not depressed. Another examples is BMI cutoffs for underweight, normal weight, overweight and obese. Although these categories are indeed grouping together continuous values, the key is they’re not arbitrary. The scales are designed and tested to determine the optimal cut points.
Likewise, some cut points are naturally occurring. For example, a variable like child’s age has meaningful categories built in: Infants, pre-schooler, school-age children, and adolescents have qualitative differences in the way their age affects other variables.
How meaningful these categories are may depend on the topic of the research. Child’s age will have different meaningful cutpoints if studying mother’s employment situation (before/after school age) than it will if studying gross motor skill development, even if you are studying the same age range of children.