*by Annette Gerritsen, Ph.D.*

In an earlier article I discussed** how to do a cross-tabulation in SPSS**. But what if you do not have a data set with the values of the two variables of interest?

For example, if you do a critical appraisal of a published study and only have proportions and denominators.

In this article it will be demonstrated how SPSS can come up with a cross table and do a Chi-square test in both situations. And you will see that the results are exactly the same.

### ‘Normal’ dataset

If you want to test if there is an association between two nominal variables, you do a **Chi-square test**.

In SPSS you just indicate that one variable (the independent one) should come in the row, and the other variable (the dependent one) should come in the column of the cross table. Then you ask for row percentages and the Chi-square statistic.

The output will give you the cross table with the numbers and row percentages, and a table including the value of the Pearson Chi-square together with a p-value. [Don't forget to check whether the **Chi-square test is valid**: at least 80% of the *expected* frequencies exceed 5 and all the expected frequencies exceed 1.]

For example, to see if the chance of getting a disease (e.g. an infection as complication after an operation) is different between those using medication A as prophylaxis and those using medication B, you would ask for a cross table of disease by medication. The SPSS output is printed below.

### Only proportions and denominators available

But how do you do a Chi-square test when you only have proportions and denominators available?

For example, you know from the literature that 33.0% of 276 people using medication A got the disease, while 34.4% of the 392 persons getting medication B got the disease.

How do you know if this is significantly different or not?

The first step is to construct the cross table yourself. Determine what figure should come in the cell for which variable 1 (medication) equals 1 and variable 2 (disease) equals 1. This is 0.33 * 276 = 91.

You do the same for the cell for which variable 1 equals 2 and variable 2 equals 1 (0.34 * 392 = 135). To get the figure for the cell for which variable 1 equals 1 and variable 2 equals 2, you deduct the 91 from the 276 and get 185. You follow the same procedure for the cell for which variable 1 equals 2 and variable 2 equals two (392 – 135 = 257).

You now have to enter these data into SPSS in the following way:

medication disease count

1 1 91

2 1 135

1 2 185

2 2 257

In order to have SPSS produce the cross table and calculate the Chi-square value, you use the ‘weight by’ option. The SPSS syntax is printed below.

WEIGHT BY count.

CROSSTABS

/TABLES=medication BY disease

/FORMAT=AVALUE TABLES/STATISTICS=CHISQ RISK

/CELLS=COUNT ROW

/COUNT ROUND CELL.

And you will get exactly the same output as with the ‘normal’ dataset. So when you need to know whether there is an association between two nominal variables and you do not have the original dataset, knowing the proportions and denominators is enough to get your answer.

*About the Author:** With expertise in epidemiology, biostatistics and quantitative research projects, Annette Gerritsen, Ph.D. provides **s**ervices to her clients focusing on the methodological soundness of each phase of an epidemiological study to ensure getting valid answers to the proposed research questions. She is the founder of *

*Epi Result*.
{ 17 comments… read them below or add one }

Dear Ms. Annet:

I have a problem. I have a total 121 respondents in a survey. Out of that 42 responded early to my questionnaire, while the rest 79 respondetns replied later. How can I use chi square test to find out the significane value?

Thanks

Dear Tareque,

Thanks for your comment.

However, a chi-square test should be used if you are comparing proportions between one or more groups, and you only have one group.

I am not sure what your hypothesis is; testing whether the number of early and late respondents is not the same? In that case, you could do a test of proportions, see the following link for a step-by-step explanation how to do this manually http://davidmlane.com/hyperstat/B71928.html. For help to do this in SPSS see the following link (binomial test): http://www.ats.ucla.edu/stat/spss/whatstat/whatstat.htm. I hope this will assist you.

Kind regards,

Annette

Dear Annet,

I would like to prepare a test result using Chi-sqaure test between two groups. Intervention and control groups. How can I use chi square test to find out the significane value (between some specific variables)? Significance difference between the group?

Regards,

Shameem

Dear Shameem,

If you want to compare an intervention group and a control group on a categorical (not continuous) outcome variable, you can indeed use the Chi-Square test. If you put the group (independent) variable in the row of the table (first row intervention, second row control) and the outcome variable (dependent) in the column and ask for row percentages, you will see whether the percentages of having certain values of the outcome variable are different for the 2 groups. The chi-square test then gives you the indication if these possible differences are significantly different or not.

Kind regards,

Annette

is there any journal or article regarding the test of proportions related to nursing

Dear annet,

I am testing just three hypothesis for my research work and I want to use 5 questions on my questionnaire for a hypothesis. How can I do it?

Thanks

Hi Mustapha, I’d need a lot more info to answer that. It’s all in the details.

Karen

I have several pieces of data comparing males and females in yes/no situations. I want to find out whether there is a statistically significant difference between males and females.

I have used SPSS and input my data as

Female Yes 20

Male Yes 6

Female No 33

Male NO 21

I then go to: analyse – non parametric tests – chi squared – select the column with the numbers as count and press OK.

This always gets me a chi squared value of .000^ and DF as 3 and Asymp.Sig = 1.000.

These values are the same NO MATTER what count values i put into SPSS.

However, my tutor has said I cannot use crosstabs.

Although i have been told that the values i get from simply using Chi Squared I should assume is correct.

but all the values i get seem too good to be true.

Help ??

Thank you in advance

I’m not sure why you shouldn’t use crosstabs. That’s what I would use, unless I’m not understanding something….

Dear Annet,

I am out to see if a certain treatment affects a number of linguistic skills. I suppose I can use the chi sq to determine if the treatment caused significant differences between these groups. My data was originally in frequencies which I I transformed into percentages. Can you please explain or direct me to a link that would help me arrange the data in SPSS?

Thank you

I have a 12×7 Table of variables. I have 279 questionaires answered but i still have 79.6% of my cell less than 5. What test could i use instead of the chi square to test for significance.

Hi Stephanie,

First of all, the assumption is not that observed values are greater than 5, it’s that expected values are. So I’ll assume that’s your situation.

A Fisher’s Exact would work, but it’s going to take a while to run with a table that size. If you have a Monte Carlo option in your software, that would be the best choice.

Dear Annet,

I would like to submit a project report on “motivation and its impact on employee performance” how can I use chi square test

Dear Anette,

I have 2 groups of subjects who both received a different treatment. The first group consisted of 239 subjects and the other group consisted of 119 subjects. What was measured during the experiment was the occurrence of events. So in the first group of 239 subjects, 730 events occurred during the study. In the second group of 119 subjects, 393 events occurred. Should I use a chi-square test to check if there are more events in the one group compared to the other?

Thanks for your advise

Dear Tientje,

It would be best to do a poisson regression as you are dealing with count data. Check out this link for some help http://www.theanalysisfactor.com/poisson-and-negative-binomial-regression/

Kind regards, Annette

Dear Annette,

I have a set of data from different populations (around 12). For each population, I have the percentages of individuals affected by three diseases, but in some cases I don’t have the raw data (some of the data comes from literature, so I only have access to N and % of individuals affected). I wish to understand if there is any significant difference between the several populations, regarding each disease. Ideally, I would like to try and see if I could trace any pattern in that differentiation, if it does exist (such as geographic origin).

What would be the best strategy in this situation?

I am looking at coverage in percentages of different programs 4 years before and after the population lived in camps. I have hypothesized that there is no difference in performance before and after life in camps as my main argument is that access to services was better while in camps. Most results show increase in coverage before and decrease after. Which test is the best for testing significance in differences in coverage before and after?

Thanks in advance