How to do a Chi-square test when you only have proportions and denominators

by guest

by Annette Gerritsen, Ph.D.

In an earlier article I discussed how to do a cross-tabulation in SPSS. But what if you do not have a data set with the values of the two variables of interest?

For example, if you do a critical appraisal of a published study and only have proportions and denominators.

In this article it will be demonstrated how SPSS can come up with a cross table and do a Chi-square test in both situations. And you will see that the results are exactly the same.

‘Normal’ dataset

If you want to test if there is an association between two nominal variables, you do a Chi-square test.

In SPSS you just indicate that one variable (the independent one) should come in the row, and the other variable (the dependent one) should come in the column of the cross table.  Then you ask for row percentages and the Chi-square statistic.

The output will give you the cross table with the numbers and row percentages, and a table including the value of the Pearson Chi-square together with a p-value. [Don’t forget to check whether the Chi-square test is valid: at least 80% of the expected frequencies exceed 5 and all the expected frequencies exceed 1.]

For example, to see if the chance of getting a disease (e.g. an infection as complication after an operation) is different between those using medication A as prophylaxis and those using medication B, you would ask for a cross table of disease by medication. The SPSS output is printed below.

Only proportions and denominators available

But how do you do a Chi-square test when you only have proportions and denominators available?

For example, you know from the literature that 33.0% of 276 people using medication A got the disease, while 34.4% of the 392 persons getting medication B got the disease.

How do you know if this is significantly different or not?

The first step is to construct the cross table yourself. Determine what figure should come in the cell for which variable 1 (medication) equals 1 and variable 2 (disease) equals 1. This is 0.33 * 276 = 91.

You do the same for the cell for which variable 1 equals 2 and variable 2 equals 1 (0.34 * 392 = 135). To get the figure for the cell for which variable 1 equals 1 and variable 2 equals 2, you deduct the 91 from the 276 and get 185. You follow the same procedure for the cell for which variable 1 equals 2 and variable 2 equals two (392 – 135 = 257).

You now have to enter these data into SPSS in the following way:

medication      disease             count

1                     1                           91
2                     1                         135
1                     2                         185
2                     2                         257

In order to have SPSS produce the cross table and calculate the Chi-square value, you use the ‘weight by’ option. The SPSS syntax is printed below.

WEIGHT BY count.

CROSSTABS
/TABLES=medication BY disease
/FORMAT=AVALUE TABLES/STATISTICS=CHISQ RISK
/CELLS=COUNT ROW
/COUNT ROUND CELL.

And you will get exactly the same output as with the ‘normal’ dataset. So when you need to know whether there is an association between two nominal variables and you do not have the original dataset, knowing the proportions and denominators is enough to get your answer.

About the Author: With expertise in epidemiology, biostatistics and quantitative research projects, Annette Gerritsen, Ph.D. provides services to her clients focusing on the methodological soundness of each phase of an epidemiological study to ensure getting valid answers to the proposed research questions. She is the founder of Epi Result.


Bookmark and Share

tn_ircNeed some statistical training that fits your busy schedule? Take a look at our On Demand Online Workshops. Our On Demand workshops are available 24/7 from wherever you have a computer and an Internet connection.

Send to Kindle

{ 24 comments… read them below or add one }

Tareque July 8, 2011 at 9:03 am

Dear Ms. Annet:

I have a problem. I have a total 121 respondents in a survey. Out of that 42 responded early to my questionnaire, while the rest 79 respondetns replied later. How can I use chi square test to find out the significane value?

Thanks

Reply

Karen July 13, 2011 at 1:54 pm

Dear Tareque,

Thanks for your comment.

However, a chi-square test should be used if you are comparing proportions between one or more groups, and you only have one group.

I am not sure what your hypothesis is; testing whether the number of early and late respondents is not the same? In that case, you could do a test of proportions, see the following link for a step-by-step explanation how to do this manually http://davidmlane.com/hyperstat/B71928.html. For help to do this in SPSS see the following link (binomial test): http://www.ats.ucla.edu/stat/spss/whatstat/whatstat.htm. I hope this will assist you.

Kind regards,

Annette

Reply

Shameem April 7, 2012 at 1:20 pm

Dear Annet,
I would like to prepare a test result using Chi-sqaure test between two groups. Intervention and control groups. How can I use chi square test to find out the significane value (between some specific variables)? Significance difference between the group?
Regards,
Shameem

Reply

Annette Gerritson April 11, 2012 at 9:22 am

Dear Shameem,

If you want to compare an intervention group and a control group on a categorical (not continuous) outcome variable, you can indeed use the Chi-Square test. If you put the group (independent) variable in the row of the table (first row intervention, second row control) and the outcome variable (dependent) in the column and ask for row percentages, you will see whether the percentages of having certain values of the outcome variable are different for the 2 groups. The chi-square test then gives you the indication if these possible differences are significantly different or not.

Kind regards,

Annette

Reply

xsteff May 10, 2012 at 8:31 am

is there any journal or article regarding the test of proportions related to nursing

Reply

mustapha February 10, 2013 at 1:56 am

Dear annet,
I am testing just three hypothesis for my research work and I want to use 5 questions on my questionnaire for a hypothesis. How can I do it?
Thanks

Reply

Karen February 13, 2013 at 3:05 pm

Hi Mustapha, I’d need a lot more info to answer that. It’s all in the details.

Karen

Reply

Gurps March 8, 2013 at 10:49 am

I have several pieces of data comparing males and females in yes/no situations. I want to find out whether there is a statistically significant difference between males and females.

I have used SPSS and input my data as

Female Yes 20
Male Yes 6
Female No 33
Male NO 21

I then go to: analyse – non parametric tests – chi squared – select the column with the numbers as count and press OK.

This always gets me a chi squared value of .000^ and DF as 3 and Asymp.Sig = 1.000.

These values are the same NO MATTER what count values i put into SPSS.
However, my tutor has said I cannot use crosstabs.
Although i have been told that the values i get from simply using Chi Squared I should assume is correct.
but all the values i get seem too good to be true.

Help ??

Thank you in advance

Reply

Karen March 13, 2013 at 9:23 am

I’m not sure why you shouldn’t use crosstabs. That’s what I would use, unless I’m not understanding something….

Reply

Nashwa Nashaat June 25, 2013 at 10:23 am

Dear Annet,
I am out to see if a certain treatment affects a number of linguistic skills. I suppose I can use the chi sq to determine if the treatment caused significant differences between these groups. My data was originally in frequencies which I I transformed into percentages. Can you please explain or direct me to a link that would help me arrange the data in SPSS?
Thank you

Reply

Stephanie Kennedy October 25, 2013 at 2:14 pm

I have a 12×7 Table of variables. I have 279 questionaires answered but i still have 79.6% of my cell less than 5. What test could i use instead of the chi square to test for significance.

Reply

Karen October 28, 2013 at 9:56 am

Hi Stephanie,

First of all, the assumption is not that observed values are greater than 5, it’s that expected values are. So I’ll assume that’s your situation.

A Fisher’s Exact would work, but it’s going to take a while to run with a table that size. If you have a Monte Carlo option in your software, that would be the best choice.

Reply

safna January 20, 2014 at 12:09 pm

Dear Annet,
I would like to submit a project report on “motivation and its impact on employee performance” how can I use chi square test

Reply

tientje January 22, 2014 at 4:11 am

Dear Anette,

I have 2 groups of subjects who both received a different treatment. The first group consisted of 239 subjects and the other group consisted of 119 subjects. What was measured during the experiment was the occurrence of events. So in the first group of 239 subjects, 730 events occurred during the study. In the second group of 119 subjects, 393 events occurred. Should I use a chi-square test to check if there are more events in the one group compared to the other?
Thanks for your advise

Reply

Annette January 27, 2014 at 4:03 am

Dear Tientje,
It would be best to do a poisson regression as you are dealing with count data. Check out this link for some help http://www.theanalysisfactor.com/poisson-and-negative-binomial-regression/
Kind regards, Annette

Reply

Jack February 14, 2014 at 7:42 pm

Dear Annette,

I have a set of data from different populations (around 12). For each population, I have the percentages of individuals affected by three diseases, but in some cases I don’t have the raw data (some of the data comes from literature, so I only have access to N and % of individuals affected). I wish to understand if there is any significant difference between the several populations, regarding each disease. Ideally, I would like to try and see if I could trace any pattern in that differentiation, if it does exist (such as geographic origin).

What would be the best strategy in this situation?

Reply

Janet March 25, 2014 at 10:20 pm

I am looking at coverage in percentages of different programs 4 years before and after the population lived in camps. I have hypothesized that there is no difference in performance before and after life in camps as my main argument is that access to services was better while in camps. Most results show increase in coverage before and decrease after. Which test is the best for testing significance in differences in coverage before and after?
Thanks in advance

Reply

Charles Odongo May 22, 2014 at 1:40 am

Dear Dr. Annette,
I performed a simple experiment comparing the performance of two malaria tests (RDT or expert microscopy) when it comes to tie-breaking the dilemma posed by negative results initially obtained by routine microscopy. out of 414 cases subjected to both tests, 14 were positive by RDT while 6 by expert microscopy (giving 3.38% & 1.44% proportions). Is it right to compare these proportions using chi square test? if so, how can I use this test in STATA?
Charles

Reply

Mustafa May 22, 2014 at 4:45 pm

Dear Annette,

I have data of 2 years 2012 and 2013 same variables and I realized the sample size is not equal for both of the year and I want to compare the years, so how can i make comparison in this case..

Reply

Eric June 11, 2014 at 9:42 pm

Dear Dr. Annette,

I have a question when come across with crosstab (chi square test). Every time after i analyse my result, it sure come out with the sentence:
“1 cells (25.0%) have expected count less than 5. The minimum expected count is 3.75.”
How can i overcome this problem?

Thanks.

Reply

ann June 19, 2014 at 7:22 am

Dear Annette,

I have done a study in screening for hepatitis B for my course. All of the result were negative. Supposedly I want to use chi-square to compare the gender (males and females) with the result (positive and negative). But after at all, the p-value shows: no statistics are computed because result is a constant. Is there any solution for this problem? Thanks!

Reply

Irfan June 23, 2014 at 1:14 am

my hypothesis is significant response of customers to wards maruti suzuki swift how do i do my questionnaires based on chi square method

Reply

Falaye September 24, 2014 at 6:08 am

Dear Annette,

I have data sets for the state of water service before and after World Bank Loan intervention using 4 metrics (I.e. Duration of water supply, rating of water quality, quantity of water consumed, and accessibility). The data has been scaled from between 1-5.

I have used independent t-Test for the analysis of relationships, but my supervisor asked d
me to use chi square.

Please your advise is needed on how to do the analysis and the best statistical test to use.

Thanks

Reply

Laurie October 14, 2014 at 3:36 pm

Hi Annette-

I am conducting a meta-analysis on gender differences in HIV risk behavior among injection drug users.
One study reported 7.2% of Men and 39.6% of women had sex work clients of the opposite sex in the last 6 months. Using the method you explained I calculated from a sample size of 818 men and 242 (total N=1060) women that this was 59 men and 91 women. I think subtracted 818(sample men)-59(men who had clients) which equals 759 who did not have clients. I did the same for women 242-91=151 and put the values into SPSS as follows:

Gender clients Count
1 1 59
2 1 91
1 2 759
2 2 151 Total=1060

1=Males 1=had clients
2=Females 2=did not have clients

I then weighted the calculation by “Count” and ran the statistics. For a few of these tests, this has given me values over 100 for the Chi-Square stat I was told by my advisor that Chi Square values should not be over 100. I checked my entries many times and am still getting these high values. What I am doing incorrectly? Thanks so much for any help you can give!
Sincerely,
-Laurie Pennington

Reply

Leave a Comment

Previous post:

Next post: