Category: Association

Contingency Table | Cross Classification: Introduction

A bivariate relationship is defined by the joint distribution of the two associated random variables.

Contingency Tables

Let X and Y are two categorical response variables. Let variable X have I levels and variable Y have J levels. The possible combinations of classifications for both variables are I\times J. The response (X, Y) of a subject randomly chosen from some population has a probability distribution, which can be shown in a rectangular table having I rows (for categories of X) and J columns (for categories of Y). The cells of this rectangular table represent the IJ possible outcomes. Their probability (say \pi_{ij}) denotes the probability that (X, Y) falls in the cell in row i and column j. When these cells contain frequency counts of outcomes, the table is called contingency or cross-classification table and it is referred to as an I by J (I \times J) table.

The probability distribution {\pi_{ij}} is the joint distribution of X and Y. The marginal distributions are the rows and columns totals obtained by summing the joint probabilities. For the row variable (X) the marginal probability is denoted by \pi_{i+} and for column variable (Y) it is denoted by \pi_{+j}, where the subscript “+” denotes the sum over the index it replaces; that is, \pi_{i+}=\sum_j \pi_{ij} and \pi_{+j}=\sum_i \pi_{ij} satisfying

\sum_{i} \pi_{i+} =\sum_{j} \pi_{+j} = \sum_i \sum_j \pi_{ij}=1

Note that the marginal distributions are single-variable information, and do not pertain to association linkages between the variables.

In (many) contingency tables, one variable (say, Y) is a response and the other X) is an explanatory variable. When X is fixed rather than random, the notation of a joint distribution for X and Y is no longer meaningful. However, for a fixed level of X, the variable Y has a probability distribution. It is germane to study how this probability distribution of Y changes as the level of X changes.


%d bloggers like this: