Variance: A Measure of Dispersion

Variance is a measure of dispersion of a distribution of a random variable. The term variance was introduced by R. A. Fisher in 1918. The variance of a set of observations (data set) is defined as the mean of the squares of deviations of all the observations from their mean. When it is computed for entire population, the variance is called the population variance, usually denoted by $\sigma^2$, while for sample data, it is called sample variance and denoted by $S^2$ in order to distinguish between population variance and sample variance. Variance is also denoted by $Var(X)$ when we speak about the variance of a random variable. The symbolic definition for population and sample variance is

$\sigma^2=\frac{\sum (X_i - \mu)^2}{N}; \quad \text{for population data}$

$\sigma^2=\frac{\sum (X_i - \overline{X})^2}{n-1}; \quad \text{for sample data}$

It should be noted that the variance is in square of units in which the observations are expressed and the variance is a large number compared to observations themselves. The variance because of its some nice mathematical properties, assumes an extremely important role in statistical theory.

Variance can be computed if we have standard deviation as variance is square of standard deviation i.e. $\text{Variance} = (\text{Standard Deviation})^2$.

Variance can be used to compare dispersion in two or more set of observations. Variance can never be negative since every term in the variance is squared quantity, either positive or zero.
To calculate the standard deviation one have to follow these steps:

1. First find the mean of the data.
2. Take difference of each observation from mean of the given data set. The sum of these differences should be zero or near to zero it may be due to rounding of numbers.
3. Square the values obtained in step 1, which should be greater than or equal to zero, i.e. should be a positive quantity.
4. Sum all the squared quantities obtained in step 2. We call it sum of squares of differences.
5. Divide this sum of squares of differences by total number of observation if we have to calculate population standard deviation ($\sigma$). For sample standard deviation (S) divide the sum of squares of differences by total number of observation minus one i.e. degree of freedom.
Find the square root of the quantity obtained in step 4. The resultant quantity will be standard deviation for given data set.

The major characteristics of the variances are:
a)    All of the observations are used in the calculations
b)    Variance is not unduly influenced by extreme observations
c)    The variance is not in same units as the observation, the variance is in square of units in which the observations are expressed.

Sampling Basics

It is often required to collect information from the data. There two methods for collecting the required information.

• Complete information
• Sampling

Complete Information

In this method the required information are collected from each and every individual of the population. This method is used when it is difficult to draw some conclusion (inference) about the population on the basis of sample information. This method is costly and time consuming. This method of getting data also called Complete Enumeration or Population Census.

Sampling

Sampling is the most commonly and wisely used method of collecting the information. In this method instead of studying the whole population only a small part of population is selected and studied and result is applied to the whole population. For example, a cotton dealer picked up a small quantity of cotton from different bale in order to know the quality of the cotton.

Purpose or objective of sampling

Two basic purposes of sampling are

1. To obtain the maximum information about the population without examining each and every unit of the population.
2. To find the reliability of the estimates derived from the sample, which can be done by computing standard error of the statistic.

Advantages of sampling over complete enumeration

1. It is much cheaper method to collect the required information from sample as compare to complete enumeration as lesser units are studied in sample rather than population.
2. From sample, the data can be collected more quickly and save time a lot.
3. Planning for sample survey can be done more carefully and easily as compare to complete enumeration.
4. Sampling is the only available method of collecting the required information when the population object/ subject or individual in population are of destructive nature.
5. Sampling is the only available method of collecting the required information when the population is infinite or large enough.
6. The most important advantage of sampling is that it provides reliability of the estimates.
7. Sampling is extensively used to obtain some of the census information.

For further reading visit: Sampling Theory and Reasons to Sample