## Chi-Square Test of Independence

Chi-square test is a non-parametric test. The assumption of normal distribution in the population is not required for this test. The statistical technique chi-square can be used to find the association (dependencies) between sets of two or more categorical variables by comparing how close the observed frequencies are to the expected frequencies. In other words, a chi square ($\chi^2$) statistic is used to investigate whether the distributions of categorical variables different from one another. Note that the response of categorical variables should be independent from each other. We use the chi-square test for relationship between two nominal scaled variables.

Chi-square test of independence is used as tests of goodness of fit and as tests of independence. In test of goodness of fit, we check whether or not observed frequency distribution is different from the theoretical distribution, while in test of independence we assess, whether paired observations on two variables are independent from each other (from contingency table).

Example: A social scientist sampled 140 people and classified them according to income level and whether or not they played a state lottery in the last month. The sample information is reported below. Is it reasonable to conclude that playing the lottery is related to income level? Use the 0.05 significance level.

 Income Low Middle High Total Played 46 28 21 95 Did not play 14 12 19 45 Total 60 40 40 140

Step by step procedure of testing of hypothesis about association between these two variable is described, below.

Step1:
$H_0$: There is no relationship between income and whether the person played the lottery.
$H_1$: There is relationship between income and whether the person played the lottery.

Step2: Level of Significance 0.05

Step 3: Test statistics (calculations)

 Observed Frequencies ($f_o$) Expected Frequencies ($f_e$) $\frac{(f_o - f_e)^2}{f_e}$ 46 95*60/140= 40.71 $\frac{(46-40.71)^2}{40.71}$ 28 95*40/140= 27.14 $\frac{(28-27.14)^2}{27.14}$ 21 95*40/140= 27.14 $\frac{(21-27.14)^2}{27.14}$ 14 45*60/140= 19.29 $\frac{(14-19.29)^2}{19.29}$ 12 45*40/140= 12.86 $\frac{(12-12.6)^2}{12.86}$ 19 45*40/140= 12.86 $\frac{(19-12.86)^2}{12.86}$ $\chi^2=\sum[\frac{(f_0-f_e)^2}{f_e}]=$ 6.544

Step 4: Critical Region:
Tabular Chi-Square value at 0.05 level of significance and $(r-1) \times (c-1)=(2-1)\times(3-1)=2$ is 5.991.

Step 5: Decision
As calculated Chi-Square value is greater than tabular Chi-Square value, we reject $H_0$, which means that there is relationship between income level and playing the lottery.

Note that there are several types of chi-square test (such as Yates, Likelihood ratio, Portmanteau test in time series) available which depends on the way data was collected and also the hypothesis being tested.