A bivariate relationship is defined by the joint distribution of the two associated random variables.
Contingency Tables
Let and are two categorical response variables. Let variable have levels and variable have levels. The possible combinations of classifications for both variables are . The response of a subject randomly chosen from some population has a probability distribution, which can be shown in a rectangular table having rows (for categories of ) and columns (for categories of ). The cells of this rectangular table represent the possible outcomes. Their probability (say ) denotes the probability that () falls in the cell in row and column . When these cells contain frequency counts of outcomes, the table is called contingency or crossclassification table and it is referred to as an by () table.
The probability distribution {} is the joint distribution of and . The marginal distributions are the rows and columns totals obtained by summing the joint probabilities. For the row variable () the marginal probability is denoted by and for column variable () it is denoted by , where the subscript “+” denotes the sum over the index it replaces; that is, and satisfying
Note that the marginal distributions are singlevariable information, and do not pertain to association linkages between the variables.
In (many) contingency tables, one variable (say, ) is a response and the other ) is an explanatory variable. When is fixed rather than random, the notation of a joint distribution for and is no longer meaningful. However, for a fixed level of , the variable has a probability distribution. It is germane to study how this probability distribution of changes as the level of changes.
Like this:
Like Loading...
Chisquare test is a nonparametric test. The assumption of normal distribution in the population is not required for this test. The statistical technique chisquare can be used to find the association (dependencies) between sets of two or more categorical variables by comparing how close the observed frequencies are to the expected frequencies. In other words, a chi square () statistic is used to investigate whether the distributions of categorical variables different from one another. Note that the response of categorical variables should be independent from each other. We use the chisquare test for relationship between two nominal scaled variables.
Chisquare test of independence is used as tests of goodness of fit and as tests of independence. In test of goodness of fit, we check whether or not observed frequency distribution is different from the theoretical distribution, while in test of independence we assess, whether paired observations on two variables are independent from each other (from contingency table).
Example: A social scientist sampled 140 people and classified them according to income level and whether or not they played a state lottery in the last month. The sample information is reported below. Is it reasonable to conclude that playing the lottery is related to income level? Use the 0.05 significance level.

Income 
Low 
Middle 
High 
Total 
Played 
46 
28 
21 
95 
Did not play 
14 
12 
19 
45 
Total 
60 
40 
40 
140 
Step by step procedure of testing of hypothesis about association between these two variable is described, below.
Step1:
: There is no relationship between income and whether the person played the lottery.
: There is relationship between income and whether the person played the lottery.
Step2: Level of Significance 0.05
Step 3: Test statistics (calculations)
Observed Frequencies () 
Expected Frequencies () 

46 
95*60/140= 40.71 

28 
95*40/140= 27.14 

21 
95*40/140= 27.14 

14 
45*60/140= 19.29 

12 
45*40/140= 12.86 

19 
45*40/140= 12.86 


6.544 
Step 4: Critical Region:
Tabular ChiSquare value at 0.05 level of significance and is 5.991.
Step 5: Decision
As calculated ChiSquare value is greater than tabular ChiSquare value, we reject , which means that there is relationship between income level and playing the lottery.
Note that there are several types of chisquare test (such as Yates, Likelihood ratio, Portmanteau test in time series) available which depends on the way data was collected and also the hypothesis being tested.
Like this:
Like Loading...