A very common question is whether it is legitimate to use Likert scale data in parametric statistical procedures that require interval data, such as Linear Regression, ANOVA, and Factor Analysis.

A typical Likert scale item has 5 to 11 points that indicate the degree of something. For example, it could measure agreement with a statement, such as 1=Strongly Disagree to 5=Strongly Agree. It can be a 1 to 5 scale, 0 to 10, etc.

### The Debate

The issue is that despite having numbers, a Likert scale item is in fact a set of ordered categories. The numerals that are attached to the different categories aren’t really quantitative. They describe order of responses, but not really quantity.

And yet, ultimately what the item is attempting to measure is *amount* of agreement. Shouldn’t that be treated as quantitative, if it’s really an amount?

One camp maintains that as ordered categories, the intervals between the scale values are not equal. So even if there is a true quantitative amount to the variable we’re attempting to measure, we’re actually measuring it only at discrete points, creating ordinal categories.

This camp claims that any mean, correlation, or other numerical operation applied to the categorical numerals is invalid. Only nonparametic statistics or other analyses for ordered data are appropriate for Likert item data (i.e. Jamieson, 2004).

The other camp maintains that yes, *technically* the Likert scale item is ordered. Even so parametric tests can be *practically* valid in some situations.

Additionally, tests that assume real numerical data still tell you a lot about what’s going on with this variable. They’re easier to run and easier to communicate.

For example, Lubke & Muthen (2004) found that it is possible to find true parameter values in factor analysis with Likert item data, if assumptions about skewness, minimum number of categories, etc., were met. Likewise, Glass et al. (1972) found that F tests in ANOVA could return accurate p-values on Likert items under certain conditions.

Meanwhile, the debate rages on.

### Recommendations

So, what is a researcher with integrity supposed to do? In the absence of a definitive answer, these are my recommendations:

- Understand the difference between a Likert item and a Likert Scale. A true Likert scale, as Likert defined it, is made up of many items, which all measure the same attitude.But many people use the term “Likert Scale” to refer to a single item from that scale. Confusion about what a Likert Scale is, no doubt, has contributed to the debate.
- Proceed with caution. Research the consequences of using
*your*procedure on Likert scale data from*your*study design and the variables*you*are measuring.The fact that everyone uses it is not sufficient justification. There are some circumstances and procedures for which it is more egregious than others. You bear the burden of justifying why it’s okay to use numerical procedures for ordinal data. - At the very least, insist that you’ll only treat it as numerical under certain conditions. All of these must be true: that the item have at least 7 values; that the underlying construct you’re measuring be continuous, and that there be some indication that the intervals between points are approximately equal.Likewise, make sure other assumptions of your test are reasonable to make (e.g. normality & equal variance of residuals, etc.).
- When you can, run the non-parametric equivalent to your test. Or whatever alternate test exists that doesn’t make assumptions of numerical data.If you get the same results, you can be confident about your conclusions. So even if you choose to report the numerical results, you can explain, maybe in a footnote, all the tests you ran and the similar results you found. Transparency is always good science.
- If you do choose to use Likert data in a parametric procedure, make sure you have strong results before making claims.Set criteria for yourself of larger effect sizes, to ensure that non-zero effects really exist, even if you’ve measured your effect with some error.Use a more stringent alpha level, like .01 or even .005, instead of .05. If you have p-values of .001 or .45, itโs pretty clear what the result is, even if parameter estimates are slightly biased. Itโs when p-values are close to .05 that the effect of bending assumptions is unclear.
- Consider the consequences of reporting inaccurate results. Will anyone ever read your paper? Will your research be published? Will others use it to shape public policy or affect practices?The answers to these questions can inform the seriousness of potential problems.

### References:

Carifio, J. & Perla, R. (2007). Ten Common Misunderstandings, Misconceptions, Persistent Myths and Urban Legends about Likert Scales and Likert Response Formats and their Antidotes. *Journal of Social Sciences, 2*, 106-116. http://thescipub.com/PDF/jssp.2007.106.116.pdf

Glass, Peckham, and Sanders (1972). Consequences of failure to meet assumptions underlying the analyses of variance and covariance, *Review of Educational Research, 42*, 237-288.

Jamieson, S. (2004). Likert scales: how to (ab)use them. *Medical Education, 38*, 1212-1218.

Lubke, Gitta H.;ย Muthen, Bengt O. (2004). Applying Multigroup Confirmatory Factor Models for Continuous Outcomes to Likert Scale Data Complicates Meaningful Group Comparisons. *Structural Equation Modeling, 11*, 514-534.

Shubhi says

Hi Keren,

If my explanatory variable is 5 point likert scale and dependent variable is numerical variable, then can I use this likert scale variables as one of the explanatory variable along with other numerical macro variables in mixed effect modelling ?

David Crow says

As a survey researcher, I’ve found that treating a Likert-type item as an interval (or, perhaps more properly, ratio) is common, especially when there are many items or the items are explanatory variables. (When using a four- or five-point ordinal response scale as the dependent variable, ordinal regression is generally better, in my view.)

The assumption that underlies treating an ordinal response scale as interval as that the categories are equidistant–i.e., that the difference between “Strongly Agree” and “Agree” is the same as the difference between “Strongly Disagree” and “Disagree.” That’s a reasonable assumption, in many cases, especially as the number of ordinal response categories increases. Psychometric research shows that beyond seven points, the human mind gets flummoxed when it attempts to make distinctions that are too nice. Imposing the assumption of equidistance does distort the data, but often doesn’t do too much violence, or none at all, while greatly simplifying the analysis.

I especially appreciate, Karen, your first observation, about the difference between “Likert scales” and “Likert items.” I see “Likert scale” misapplied so often that I want to tear my hair in wild despair. I would only add that “Likert item” itself is misused frequently. Technically, an item in a Likert scale must be symmetrical (that is, the lowest value in the scale is just the semantic negation of the highest value, such as “Strongly Disagree” and “Strongly Agree”). Also, there must be a middle, neutral option (such as “Neither Agree nor Disagree”), which necessarily implies a minimum of five points.

So, a four-point ordinal scale like “Not at All,” “Not Much,” “Somewhat,” and “A Lot” is not a Likert item both because it is not symmetrical (with an even number of categories arrayed above and below a hypothetical midpoint on the scale) and because there is no neutral midpoint. Even the four-point ordinal scale “Strongly Disagree,” “Disagree,” “Agree,” “Strongly Agree,” though symmetrical, lacks a neutral midpoint. So, in my view, using the term “Likert-type item” to apply to the latter example is more accurate. The first example is so far from being a Likert item that I wouldn’t even use the word “Likert” at all. But I appreciate your concern for accuracy, Karen!

Bruce Weaver says

Here’s another article to add to the reference list:

Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348. https://www.sciencedirect.com/science/article/pii/S0022103117307746

Alex Sampratt says

This was really helpful

Steve Provost says

Thank you Karen this is very useful and the lead to reference material is invaluable.

DEBJANI ROY says

Can we use the scale value of the likert items in the model?

DEBJANI ROY says

Suppose I have a binary response and a set of covariates measured on likert scale. Now by treating the likert items as a continuous variable we can get a density plot and by observing the density plot we can make a distributional assumption. Based on that distribution we can obtain the scale values of the item. Now can we use those scale values of the likert items in the model ?

Deja L. says

In fact, I had red somewhere it was not Likert Scale but Likert Rating.

There must some difference though behind it.

Deja

Audrey says

Is it possible to use a moderator analysis with dichotomous moderator variables with Likert scales? I at first thought Likert scales were only ordinal, but Schwartz, Wilson, and Goff (2019) say that they are in the majority in considering Likert scales to be interval, and that parametric tests can be run on Likert scales. For example: if looking at the relationship between two summative scores on two separate Likert scales and then considering the significance of a dichotomous moderator (let’s say gender). Can this be done using moderator analysis with a dichotomous moderator variable?

Karen Grace-Martin says

Sure. All moderator analysis means is you add an interaction to your model. But this can be done in all regression models (even ordinal logistic regression). So you don’t have to assume the likert variable is interval to have moderators.

Shawn Acheson, Ph.D. says

Could you provide the complete reference for “Schwartz, Wilson, and Goff (2019)”

Many thanks,

Shawn

Kamal says

Seems to refer to: Schwartz, B.M., Wilson, J.H. and Goff, D.M., 2018. An EasyGuide to research design & SPSS. Sage Publications.

https://books.google.es/books?id=VQkuDwAAQBAJ&pg=PT5&lpg=PT5&dq=Schwartz,+Wilson,+and+Goff+(2019)&source=bl&ots=-7qZ4jeGwp&sig=ACfU3U046NBkNt51RN4V79Rc7q7JjrLGwg&hl=en&sa=X&ved=2ahUKEwjX47i1l_vjAhVXo54KHQNeA-YQ6AEwD3oECAkQAQ#v=onepage&q=likert&f=false

Felipe says

Hi Karen,

I’ve read some of your answers and they helped me a lot! Thank you very much.

Still, I have a question:

Let me introduce you to my problem:

I have a survey with a 1-5 ordered DV and, let’s say, 10 1-5 ordered IV variables. So, if I undestood well, my best option is to use an ordinal regression model (polr function in R).

But then, my real problem is, how can I determine the most important IV’s?

Spearman rank correlation is not enough and the coefficients (given in a Log way) are not enough either.

I’ve tried to estimate the Proportion Chi-square of each predictor but I’m not sure if taht’s the correct thing to do.

I hope you can help me! Beforehand, thanks!

Karen Grace-Martin says

Hi Felipe,

Believe it or not, that’s actually a tough question and I’d have to ask you a lot of clarification first. If that is something you’re interested, you might want to sign up for our membership program. It includes both a forum and weekly live Q&A sessions with our team of statistical consultants. https://www.theanalysisfactor.com/membership-program/

Ian says

Hi! I am doing a regression analysis involving Likert scale. For every variable, there are 5 likert items. Is it possible to run a regressiin analysis in yhis data?

Colin Reynolds says

Hi Karen,

I’m just a lowly practitioner in the government sector, but my work sometimes informs decisions that affect real people, so I have some aspirations to getting things right. My problem is that while I trust the statistical tools applied to them, I have more fundamental issues with Likert scales, in that I find it very hard to be convinced that they validly meet a basic definition of ‘meaningful evidence’ of anything. Happy to explain why, but won’t bore you with that unless you are interested. My real issue is simply that ultimately valid or not, it is easy to demonstrate that likert scale data points are of very low data quality compared to many other forms of ‘qualitative’ data. This type of quasi-ordinal data construct has not actually been around for very long in the bigger scheme of things, but many social scientists appear to be obsessed with them to the point of failing to look for something better (and indeed, getting very cross when anyone even suggests that we may be able to construct qualitative measures with far better data quality). I can understand this from a ‘convenience’ perspective, but from a ‘science’ perspective it is like being climbing out to the end of a branch and then getting stuck. Whether he was right or wrong to ‘invent’ Likert scales is not that relevant, new ideas are always needed and researchers are right to trial them. What I don’t think the late Prof. Likert would be happy to hear is that obsession with the convenience of ‘deriving data’ from Likert scales is impeding new paradigm shifts and improvements in qualitative research. I have been doing my job for quite a while, and have had extensive personal experience with the ‘passion’ the two camps you mention in your article defend their respective positions. What if they are both missing the point? I guess the point of my posting is just to look for your reaction to my position, and I am hoping it is not the normal one I get when I suggest that there may be better tools for qualitative research than Likert Scales, which is excommunication as a heretic.

Cheers,

Colin

Karen says

Hi Colin,

I just wanted to say that I agree wholeheartedly. My understanding is that a true “Likert Scale” as developed by Likert actually has solid psychometric properties, but requires a lot of testing and is composed of many items. Not just a single 1-5 item.

But yes, it’s not the only, or best, option.

moein says

I have likert scale responses (1-5 rating) If I need to check the normality distribution of responses , how do i do that? k-s test?

Karen says

QQ Plots (aka Normal Probability Plots) are better than K-S tests for checking normality:

https://www.theanalysisfactor.com/anatomy-of-a-normal-probability-plot/

Ohaegbu O Kingsley says

Pls how can you help me to understand how to determine the independent and dependant when using likert to test hypothesis? If possible send to me a lecture video to help me out.

Thank you

Maria says

Hello! Thank you so much for your post, it was very helpful. I wonder if you can help me with my question. I conducted a study, and i need to use a MANOVA test since i have 4 dependent variables to compare within two groups. 3 out of 4 dependent variables consist from 4 different questions measured by 5 points Likert scale. So i used the median command to combine them and to obtain my dependent variables. As a results i obtained different scales for each dependent variable such as: 2.50/ 3/ 3.50/ 4/ 4.50/ 5; 3/ 3.50/ 4/ 4.50/ 5; 1/ 2/ 3/ 4/ 5. I do understand why i have such a results. However, my question is if i can use such a scales directly for my MANOVA? or should i recode them in some way? It is required for MANOVA to use dependent variables with the continuous scales, but how can i prove i have such for my test if they r measured in 5 points Likert scale? Is it possible? I will really appreciate if you can help me. Thanks you!

Bruce Weaver says

Here is another article you could cite in support of using parametric tests with Likert scales, and even items.

Norman, G. (2010). Likert scales, levels of measurement and the โlawsโ of statistics. Advances in health sciences education, 15(5), 625-632. http://link.springer.com/article/10.1007/s10459-010-9222-y

Er Gee says

Thank you for the link, Mr. Bruce ๐

asdfasf says

great link, thx

Johnny says

Hi – great article, thank you! I’ve got a 20-item scale, each question in the form of a Likert (ranging from 0-3) – that means there are 80 possible scores. I’d like to use this variable as a dependent var in some sort of regression with several predictor variables. Technically this score is “ordinal” since it came from a sum of Likert scales, but an ordinal regression with 80 possible “categories” is a bit much. The score is also distributed very much like a binomial distribution and not at all normal (could I use negative binomial regression perhaps?). Any advice on how to regress this would be appreciated!

Karen says

Hi Johnny,

I don’t think the full 80-point scale would be considered ordinal. It’s just when you use a single item with values of 0-3.

I personally would start with a linear regression on this variable, but check assumptions. Often this issue with this type of variable is when too many values are at the boundaries (0 or 80) so you don’t have normal residuals. But not always, so it’s a good idea to check.

Max Thomsen says

Thanks a lot, Karen! ๐

Max Thomsen says

Dear Karen,

thanks for your really helpful article. In point 3 you say that in order to (more or less) safely consider an ordinal predictor continuous it should have at least 5 to 7 points. Can you name a referance for that statement?

Thanks and best regards, Max

Karen says

Hi Max,

The Lubke & Muthen article listed above discusses it. It’s been a while since I read it, but I believe they did it in the context of factor analysis, not predictor variables in regression.

Haitham Elsaid says

My dependent variable is measured on a 5 point likert scale and independent variable is measured on a 5 point likert scale. Is it appropriate to run linear regression analysis on such data? What if the same for multiregression?

Timi Joy says

Can anyone suggest a academic study which supports the fact that different points in Likert scale(5, 7, 8, 11) can use in the same study.

jordyn says

Hi,

I ran PCA on likert scale variables (answers ranged 0-6). I found skewness and kurtosis ok on all variables. When I write up my paper, do I need to justify using likert scale items in PCA or is it just so common no one justifies it anymore? Would I use Lubke article above to cite to show how I could use likert data as cotinuous?

Thanks

Madushani Gunathilake says

Can i use regression analysis for my 5 point likert scale data. And can i link 5 point likert scale data with contonuous varuables??

aslam says

i have R value 1 how can reduce this value R VALUE plz help

Kerry Dungay says

Hi, loved your article, really helpful. I have a question. I have some confusion over what is exactly summed. If you have a questionnaire with say 7 likert items (questions), is the summed amount for the population the interval data? For example if it was a job satisfaction survey and I wanted to compare satisfaction between males and females, could I sum the overall score for females then males and compare means as interval data? I think I am confused at what point the data is summed to become interval data, is at individual level or population? The other thing, as likert is closed question survey, the research method is quantative, but at a statistical level it is qualitative, is this right? Thanks, this is for a research assignment due in soon, I have to explain my data analysis method, so desperately need advise!

Karen says

Hi Kerry,

Well, first of all, you don’t have population data, just samples. But you would sum scores for individuals. Now, there is more to it as to whether it’s truly interval. To be sure I would suggest reading more in the psychometric literature. As for the quantitative/qualitative, if you need that for an assignment, I think you need to figure that out on our own. ๐

Dominique says

Hi Kerry, i’m a bit late adding to this question (which should have also given you ample time to research the literature itself though and you might even know more about it than me at the moment :).

So just wanted to say a bit about the tip of the iceberg that I’m aware of. I’m a ‘mere’ researcher in that sense but also with some background in teaching of statistics to bachelor students during my psychology major which already brought me into contact of some of the more controversial techniques out there regarding whether treating Likert scale (orindal, at the very least in terms of how the Likert item is coded) as an interval scale is at all justified. Since then statistics has always become more of a background thing, and I do admit we treat it as an interval variable in many cases at cognitive psychology, but always with that nagging voice at the back of my head stemming from me lecturing students that it’s wrong even if often done, telling me that it’s not right a right way of treating the data at all (in most cases).

Now using Likert (!)items(!) as continuous predictors is very hard to justify in my opinion. The following might all be old news by now or even just plain wrong but it makes a lot of intuitive sense to me. Because I haven’t been keeping myself up to date on many statistical debates in academia, these are primarily my own thoughts and conclusions which I think are hard to deny (not impossible certainly but still quite hard to convince me otherwise I think. But feel free to provide counterarguments..).

1. People differ in the way they interpret these labels attached to a response option (and consequently the ‘distance’ it would translate to on the underlying true thing or trait that is being measured by these questionnaires – e.g. Extroversion or Optimism which I personally think are very good examples as in my experience extroversion and Optimism can most certainly not be reduced to 2 categories (or however many categories you would like to create). With the exception perhaps being your outliers, but using outliers as group seems very risky especially because you usually don’t know for sure what causes those outliers and you simply won’t gave enough of them in your sample :b.

2. Added to that you can’t even assume that the difference between the different Likert item points is consistent ac, neither within nor across participants. The difference in ‘agreement’ one person think there might be (or experienced subjectively) between strongly agreeing and simply agreeing completely, might perhaps be smaller than the point at which the opinion changes from “slightly (dis)agreeing” to the “neutral” labeled data point; or if there is no neutral the difference between slightly agreeing versus slightly disagreeing. The reason being that the change in attitude is no longer just about ‘strength of agreement or disagreement, but also a change in what you claim to hold true. The leap from slight disagreement to to slight agreement for instance is for this very reason I believe somewhat if not much larger than the leap from agreeing strongly to agreeing completely. Similarly is the change from slightly (dis)agreeing to neutral also qualitatively different kind of leap in opinion, namely having one, however slight it might be, and having no real opinion on the matter. Now it becomes not even about the stength of agreement and disgreements but a sudden absence of opinoin onj a statement which would be though to measure something else therefore., At least to me this makes makes a lot of sense intuitively. I have no way to prove any of this but to me it seems assuming that those likert points are at pretty much the same distance from each other on an equivalent continuous measurements seems a little ridiculous to be fair (yet we all still keep making these assumptions for the sake of convenience, at least at the moment – I think this will change pretty soon though)

4. Some more pragmatic (:P) advice. Once you reach 80 categories so you can do it the ‘ANOVA-way’, I think it’s best to deal with by just learning simple and multiple regression :). They are not hard to learn and interpret even if some older researchers seem to or might think they would be hard to learn, cost valuable time and so on. Not regression, not at all. Neither are the non-parametric version very hard to hard to understand either. Its more about getting used to the way the results are reported bad how yo interpret the non-parametric based statistics.. I really see no good excuse for not keeping up to date with current ways of analyzing data, especially since they have many apparent advantages and seem easy to learn. I bet you can learn how to do regression in less than a day for example – heck first year bachelor students learn the basics nowadays in 1 or two 2-hour meetings. Small price to pay for more reliable results.

4. A bit off topic, but still thought worth mentioning si the multilevel analysis approach to analyzing your data. It migth also solve some of the above issues as the source of the problem seems to simply come down to grouping people together into categories, and therefore getting rid of the individual variability.On the other hand it si the very fact that there are continuous variables in the analysis that allow you to capture individual differences so wouldn’t really solve the median split and Likert scaling issues after all I guess.. Then again you should ask yourself, once you are capable of doing multilevel analysis of your data, which from my point of view and what I learned about it is quite a superior way of analyzing data than with either any kind of anova or regression. I’d highly recommend looking into it, and consider it a possible solution to your question you had 3 years ago ;), but also for future research decision.

Also when I started my major (in psychology, later cog-psychology/neuroscience we did (and the students still still do learn about the theory of regression (also different versions, such as e.g. logistic regression). And importantly which technique is most suited to the data they want to analyze because of the very reason that you would loose valuable ‘data’ (n this case variance, or sample size depending on how you approach it) when reducing a cont. var. to a cat.var in your model.

I think the newer generations of researchers will for that reason alone be much more skilled in handling such problems due to this extra knowledge, and possible courses in multilevel analysis which is becoming increasingly popular over here).

And as old habits die hard for those who know nothing nut ANOVA, it seem like the best approach to solving ‘the problem’ is to just let time take care of it and to just always be criticial of results that used median split, or likert scales being treated as interval variables, and draw you own more ‘hesistant’ or careful conclusions. Trying to change their opinions and way of doing research seems like a waste of time and energy

Perhaps it’s different in the US if regression has still not become part of the standard statistics course ๐ I can hardly imagine that to be true though. But if it is, then, well step up guys. Those are essential skills to have in analyzing data..!

So on the individual level, learn some more techniques that can handle other types of data ๐

On the university level have students learn them as well as part of their education if it isn’t part of it already.

And in a global sense Time will take care of the stubborn ones findinh it hard to let go of the old ways.

Best,

Dominique

Faisal says

i have a question. please reply. i am doing statistical analysis. my independent variables are 5 point likert scale. and dependent variable is binary. should i use binary logistic regression? what options should i select?

Karen says

Hi Faisal,

I would need a lot more information to actually suggest an analysis. If your outcome is binary, then indeed logistic regression is one possibility. But it depends on a lot of other questions, including “what is it you want to test?”

JUSTICE MOSES K. AHETO says

Some papers have it that one can combine likert type questions into likert scale by summing the responses under each construct to form scores which reduced the data from ordinal scale to interval scale in which parametric test can be conducted like ANOVA, Regression etc.

What about that?

Thanks

jsny says

how can i use data collected using likert scale for doing corelation

Karen says

Depends what you mean by likert scale. If you mean something like a 1-5 scale item, your best bet would be a spearman rank correlation. No assumptions of normality there.

rohail says

how can we convert the data into an independent variable so that i can use factor analysis, as i am new to this software can somebody help me in this?? have collected data on customer satisfaction on a 5-pint likert scaling..please help

Karen says

Rohail, I’m not entirely sure what you’re asking, but generally people do use likert data for factor analysis.

Harleen says

Hi

I have done factor analysis on the data collected (likert items). Now i am lost as to how should i proceed further. SPSS has given me 9 factors out of 50 ordinal variables.

Can i apply regression or Anova on such data?

Also can i subdivide a factor into two or more factors by doing factor analysis again on those items which constitute a factor (originally computed) e.g. Brand image can be subdivided as quality, product attributes, so and so forth. Can i do so?

Karen says

Hi Harleen,

There’s a lot to using the results from factor analysis in other analyses. More than I could ever answer here (it’s a book, really).

I would strongly suggest getting this book, even if you don’t use SAS. A Step-by-Step Approach to Using the SAS System for Factor Analysis and Structural Equation Modeling by Larry Hatcher. He really explains everything step-by-step.

I recently suggested it to a client who needed to use Factor Analysis, and she said it cleared up all her confusion.

Karen

zik oseni says

Can someone tell me why Firm’s age is used as a proxy for information asymmetry. You can post your response here or email to me @ zikoseni@yahoo.com.

Thanks

Alireza says

My question is: if our data were parametric, can we use Likert Scale data in Factor Analysis directly? Otherwise, to identifying important variables in my study with my Likert Scale data, what should I do?

Thank you

Karen says

Alireza,

So are you asking if you can use Factor Analysis for Likert Scale data?

Theoretically, Likert items do not meet the assumptions for a Factor Analysis. That Lubke and Muthen paper referenced above, however, found that in some situations, the results are quite valid. I would suggest reading that paper and seeing if your data fit the situations where it works well.

Karen

ZIK OSENI says

I am a student. Can someone help me to locate a statistical software (free) to run data I gathered using Likert Scale. I am working on asymmetric information in the capital market. I can be reached via zikoseni@yahoo.com. Thank you

Karen says

Hi Zik,

Just to up, if you need free, you have two choices:

PSPP is an opensource version of SPSS base. Easy to use, but limited.

R requires more programming, but can do much, much more.

Andy says

That Carifio and Perla paper looks handy – ta for sharing!