Chi-Square Test of Independence: Complete Guide

Introduction to Chi-Square Test

Chi-square test is a non-parametric test. The assumption of normal distribution in the population is not required for this test. The statistical technique chi-square can be used to find the association (dependencies) between sets of two or more categorical variables by comparing how close the observed frequencies are to the expected frequencies. In other words, a chi-square ($\chi^2$) statistic is used to investigate whether the distributions of categorical variables differ. Note that the responses of categorical variables should be independent of each other. We use the chi-square test to find a relationship between two nominal scaled variables.

Test Assumptions

  • The data should be categorical.
  • The observations should be independent.
  • The expected frequency in each cell should be at least 5. If this assumption is violated, one might need to combine categories or use a different test.

Use and Application of Chi-Square Test

The chi-square test of independence is used as a test of goodness of fit and as a test of independence. In a test of goodness of fit, we check whether or not the observed frequency distribution is different from the theoretical distribution. In contrast, in a test of independence, we assess, whether paired observations on two variables are independent from each other (from the contingency table).

Example of Chi-Square Test

A social scientist sampled 140 people and classified them according to income level and whether or not they played a state lottery in the last month. The sample information is reported below. Is it reasonable to conclude that playing the lottery is related to income level? Use the 0.05 significance level.

 Income
LowMiddleHighTotal
Played46282195
Did not play14121945
Total604040140

A step-by-step procedure for testing the hypothesis about the association between these two variables is described, below.

Step1:
$H_0$: There is no relationship between income and whether the person played the lottery.
$H_1$: There is a relationship between income and whether the person played the lottery.

Step2: Level of Significance 0.05

Step 3: Test statistics (calculations)

Observed Frequencies ($f_o$)Expected Frequencies ($f_e$)$\frac{(f_o – f_e)^2}{f_e}$
4695*60/140= 40.71$\frac{(46-40.71)^2}{40.71}$
2895*40/140= 27.14$\frac{(28-27.14)^2}{27.14}$
2195*40/140= 27.14$\frac{(21-27.14)^2}{27.14}$
1445*60/140= 19.29$\frac{(14-19.29)^2}{19.29}$
1245*40/140= 12.86$\frac{(12-12.6)^2}{12.86}$
1945*40/140= 12.86$\frac{(19-12.86)^2}{12.86}$
$ \chi^2=\sum[\frac{(f_0-f_e)^2}{f_e}]=$6.544

Step 4: Critical Region:
Tabular Chi-Square value at 0.05 level of significance and $(r-1) \times (c-1)=(2-1)\times(3-1)=2$ is 5.991.

Step 5: Decision
As the calculated Chi-Square value is greater than the tabular Chi-Square value, we reject $H_0$, which means that there is a relationship between income level and playing the lottery.

Note that there are several types of chi-square tests (such as Yates, Likelihood ratio, test in time series) available which depend on the way data was collected and also the hypothesis being tested.

Chi-Square Test

Perform online MCQs Test about General Knowledge

Learn more about Non-Parametric Tests