A very common question is whether it is legitimate to use Likert scale data in parametric statistical procedures that require interval data, such as Linear Regression, ANOVA, and Factor Analysis.
A typical Likert scale item has 5 to 11 points that indicate the degree of something. For example, it could measure agreement with a statement, such as 1=Strongly Disagree to 5=Strongly Agree. It can be a 1 to 5 scale, 0 to 10, etc.
The issue is that despite having numbers, a Likert scale item is in fact a set of ordered categories. The numerals that are attached to the different categories aren’t really quantitative. They describe order of responses, but not really quantity.
And yet, ultimately what the item is attempting to measure is amount of agreement. Shouldn’t that be treated as quantitative, if it’s really an amount?
One camp maintains that as ordered categories, the intervals between the scale values are not equal. So even if there is a true quantitative amount to the variable we’re attempting to measure, we’re actually measuring it only at discrete points, creating ordinal categories.
This camp claims that any mean, correlation, or other numerical operation applied to the categorical numerals is invalid. Only nonparametic statistics or other analyses for ordered data are appropriate for Likert item data (i.e. Jamieson, 2004).
The other camp maintains that yes, technically the Likert scale item is ordered. Even so parametric tests can be practically valid in some situations.
Additionally, tests that assume real numerical data still tell you a lot about what’s going on with this variable. They’re easier to run and easier to communicate.
For example, Lubke & Muthen (2004) found that it is possible to find true parameter values in factor analysis with Likert item data, if assumptions about skewness, minimum number of categories, etc., were met. Likewise, Glass et al. (1972) found that F tests in ANOVA could return accurate p-values on Likert items under certain conditions.
Meanwhile, the debate rages on.
So, what is a researcher with integrity supposed to do? In the absence of a definitive answer, these are my recommendations:
- Understand the difference between a Likert item and a Likert Scale. A true Likert scale, as Likert defined it, is made up of many items, which all measure the same attitude.But many people use the term “Likert Scale” to refer to a single item from that scale. Confusion about what a Likert Scale is, no doubt, has contributed to the debate.
- Proceed with caution. Research the consequences of using your procedure on Likert scale data from your study design and the variables you are measuring.The fact that everyone uses it is not sufficient justification. There are some circumstances and procedures for which it is more egregious than others. You bear the burden of justifying why it’s okay to use numerical procedures for ordinal data.
- At the very least, insist that you’ll only treat it as numerical under certain conditions. All of these must be true: that the item have at least 7 values; that the underlying construct you’re measuring be continuous, and that there be some indication that the intervals between points are approximately equal.Likewise, make sure other assumptions of your test are reasonable to make (e.g. normality & equal variance of residuals, etc.).
- When you can, run the non-parametric equivalent to your test. Or whatever alternate test exists that doesn’t make assumptions of numerical data.If you get the same results, you can be confident about your conclusions. So even if you choose to report the numerical results, you can explain, maybe in a footnote, all the tests you ran and the similar results you found. Transparency is always good science.
- If you do choose to use Likert data in a parametric procedure, make sure you have strong results before making claims.Set criteria for yourself of larger effect sizes, to ensure that non-zero effects really exist, even if you’ve measured your effect with some error.Use a more stringent alpha level, like .01 or even .005, instead of .05. If you have p-values of .001 or .45, it’s pretty clear what the result is, even if parameter estimates are slightly biased. It’s when p-values are close to .05 that the effect of bending assumptions is unclear.
- Consider the consequences of reporting inaccurate results. Will anyone ever read your paper? Will your research be published? Will others use it to shape public policy or affect practices?The answers to these questions can inform the seriousness of potential problems.
Carifio, J. & Perla, R. (2007). Ten Common Misunderstandings, Misconceptions, Persistent Myths and Urban Legends about Likert Scales and Likert Response Formats and their Antidotes. Journal of Social Sciences, 2, 106-116. http://thescipub.com/PDF/jssp.2007.106.116.pdf
Glass, Peckham, and Sanders (1972). Consequences of failure to meet assumptions underlying the analyses of variance and covariance, Review of Educational Research, 42, 237-288.
Jamieson, S. (2004). Likert scales: how to (ab)use them. Medical Education, 38, 1212-1218.
Lubke, Gitta H.; Muthen, Bengt O. (2004). Applying Multigroup Confirmatory Factor Models for Continuous Outcomes to Likert Scale Data Complicates Meaningful Group Comparisons. Structural Equation Modeling, 11, 514-534.