Variance is a measure of dispersion of a distribution of a random variable. The term variance was introduced by R. A. Fisher in 1918. The variance of a set of observations (data set) is defined as the mean of the squares of deviations of all the observations from their mean. When it is computed for entire population, the variance is called the population variance, usually denoted by , while for sample data, it is called sample variance and denoted by in order to distinguish between population variance and sample variance. Variance is also denoted by when we speak about the variance of a random variable. The symbolic definition for population and sample variance is
It should be noted that the variance is in square of units in which the observations are expressed and the variance is a large number compared to observations themselves. The variance because of its some nice mathematical properties, assumes an extremely important role in statistical theory.
Variance can be computed if we have standard deviation as variance is square of standard deviation i.e. .
Variance can be used to compare dispersion in two or more set of observations. Variance can never be negative since every term in the variance is squared quantity, either positive or zero.
To calculate the standard deviation one have to follow these steps:
- First find the mean of the data.
- Take difference of each observation from mean of the given data set. The sum of these differences should be zero or near to zero it may be due to rounding of numbers.
- Square the values obtained in step 1, which should be greater than or equal to zero, i.e. should be a positive quantity.
- Sum all the squared quantities obtained in step 2. We call it sum of squares of differences.
- Divide this sum of squares of differences by total number of observation if we have to calculate population standard deviation (). For sample standard deviation (S) divide the sum of squares of differences by total number of observation minus one i.e. degree of freedom.
Find the square root of the quantity obtained in step 4. The resultant quantity will be standard deviation for given data set.
The major characteristics of the variances are:
a) All of the observations are used in the calculations
b) Variance is not unduly influenced by extreme observations
c) The variance is not in same units as the observation, the variance is in square of units in which the observations are expressed.
A research can be classified into two groups: Qualitative and Quantitative Research
- Qualitative Research
Qualitative research involves collecting data from in-dept interviews, observations, field notes, and open-ended questions in questionnaire etc. The researcher himself is the primary data collection instrument and the data could be collected in form of words, images, and patterns etc.For Qualitative Research, data Analysis involves searching for patterns, themes and holistic features. Results of such research are likely to be context specific and reporting takes the form of a narrative with contextual description and direct quotations from researchers.
- Quantitative Research
Quantitative research involves collecting quantitative data based on precise measurement using some structured, reliable and validated collection instruments (questionnaire) or through archival data sources. The nature of quantitative data is in the form of variables and its data analysis involves establishing statistical relationship. If properly done, results of such research are generalize able to entire population.Quantitative research could be classified into two groups depending on the data collection methodologies
- Experimental Research
The main purpose of experimental research is to establish a cause and effect relationship. The defining characteristics of experimental research are active manipulation of independent variables and the random assignment of participants to the conditions to be manipulated, everything else should be kept as similar and as constant as possible.To depict the way experiments are conducted, a term used is called design of experiment. There are two main types of experimental design.
- Between Subjects Design
In within subject design, the same group of subjects serves in more than one treatment
- In between group design, two or more groups of subjects, each of which being tested by a different testing factor simultaneously.
- Non-Experimental Research
Non-Experimental Research is commonly used in sociology, political science and management disciplines. This kind of research is often done with the help of a survey. There is no random assignment of participants to a particular group nor do we manipulate the independent variables. As a result one cannot establish a cause and effect relationship through non-experimental research. There are two approaches to analyzing such data
- Tests for approaches to analyzing such data such as IQ level of participants from different ethnic background.
- Tests for significant association between two factors such as firm sales and advertising expenditure.
Chi-square test is a non-parametric test. The assumption of normal distribution in the population is not required for this test. The statistical technique chi-square can be used to find the association (dependencies) between sets of two or more categorical variables by comparing how close the observed frequencies are to the expected frequencies. In other words, a chi square () statistic is used to investigate whether the distributions of categorical variables different from one another. Note that the response of categorical variables should be independent from each other. We use the chi-square test for relationship between two nominal scaled variables.
Chi-square test of independence is used as tests of goodness of fit and as tests of independence. In test of goodness of fit, we check whether or not observed frequency distribution is different from the theoretical distribution, while in test of independence we assess, whether paired observations on two variables are independent from each other (from contingency table).
Example: A social scientist sampled 140 people and classified them according to income level and whether or not they played a state lottery in the last month. The sample information is reported below. Is it reasonable to conclude that playing the lottery is related to income level? Use the 0.05 significance level.
|Did not play||14||12||19||45|
Step by step procedure of testing of hypothesis about association between these two variable is described, below.
: There is no relationship between income and whether the person played the lottery.
: There is relationship between income and whether the person played the lottery.
Step2: Level of Significance 0.05
Step 3: Test statistics (calculations)
|Observed Frequencies ()||Expected Frequencies ()|
Step 4: Critical Region:
Tabular Chi-Square value at 0.05 level of significance and is 5.991.
Step 5: Decision
As calculated Chi-Square value is greater than tabular Chi-Square value, we reject , which means that there is relationship between income level and playing the lottery.
Note that there are several types of chi-square test (such as Yates, Likelihood ratio, Portmanteau test in time series) available which depends on the way data was collected and also the hypothesis being tested.
Mean: Measure of Central Tendency
The measure of Central Tendency Mean (also know as average or arithmetic mean) is used to describe the data set as a single number (value) which represents the middle (center) of the data, that is average measure (performance, behaviour, etc) of data. This measure of central tendency is also known as measure of central location or measure of center.
Mathematically mean can be defined as the sum of the all values in a given dataset divided by the number of observations in that data set under consideration. The mean is also called arithmetic mean or simply average.
Example: Consider the following data set consists of marks of 15 student in certain examination.
50, 55, 65, 43, 78, 20, 100, 5, 90, 23, 40, 56, 70, 88, 30
The mean of above data values is computed by adding all these values (50 + 55 + 65 + 43 + 78 + 20 + 100 + 5 + 90 + 23 + 40 + 56 + 70 + 88 + 30 = 813) and then dividing by the number of observations added (15) which equals 54.2 marks, that is
The above procedure of calculating the mean can be represented mathematically
The Greek symbol (pronounced “mu”) is the representation of population mean in statistics and is the number of observations in the population data set.
The above formula is known as population mean as it is computed for whole population. The sample mean can also be computed in same manner as population mean is computed. Only the difference is in representation of the formula, that is,
The is representation of sample mean and shows number of observations in the sample.
The mean is used for numeric data only. Statistically the data type for calculating mean should be Quantitative (variables should be measured on either ratio or interval scale), therefore, the numbers in data set can be continuous and/ or discrete in nature.
Note that mean should not be computed for alphabetic or categorical data (data should not belong to nominal or ordinal scale). Mean is influenced by very extreme values in data, i.e. very large or very small values in data changes the mean drastically.
For other measures of central tendencies visit: Measures of Central Tencencies
In probability each unit of the population has known (non-zero) probability of being included in the sample and samples are selected randomly by using some random selection method. That’s why, probability sampling may also be called random sampling. In probability sampling reliability of the estimates can be determined. In probability sampling, samples are selected without any interest. The advantage of probability sampling is that it provides a valid estimates of sampling error. Probability sampling is widely used in various areas such as industry, agriculture, business sciences, etc.
Important types of probability sampling are
- Simple Random Sampling
- Stratified random sampling
- Systematic sampling
- Cluster sampling
In non-probability sampling samples are selected by personal judgement due to this personal judgement in selection of sample bias may include which makes the result unrepresentative. Non-probability sampling may also be called as non-random sampling. The disadvantage of non-probability is that the reliability of the estimates cannot be determined.
Types of non-probability sampling are
- Purposive sampling
- Quota sampling
- Judgement sampling
- Snowball sampling
- Convenience sampling
Differences between Probability and Non-Probability Sampling
The difference between non-probability and probability sampling is that non-probability sampling does not involve random selection of object while in probability sampling objects are selected by using some random selection method. In other words it means that non-probability samples aren’t representative of the population, but it is not necessary. But it may means that non-probability samples cannot depend upon the rationale of probability theory.
In general, researchers may prefer probabilistic or random sampling methods over a non-probabilistic sampling method, and consider them to be more accurate and rigorous. However, in applied social sciences, for researchers there may be circumstances where it is not possible to obtain sampling using some probability sampling methods. Even practical or theoretically it may not be sensible to do random sampling. Therefore a wide range of non-probability sampling methods may be considered, in these circumstances.
It is often required to collect information from the data. There two methods for collecting the required information.
- Complete information
In this method the required information are collected from each and every individual of the population. This method is used when it is difficult to draw some conclusion (inference) about the population on the basis of sample information. This method is costly and time consuming. This method of getting data also called Complete Enumeration or Population Census.
Sampling is the most commonly and wisely used method of collecting the information. In this method instead of studying the whole population only a small part of population is selected and studied and result is applied to the whole population. For example, a cotton dealer picked up a small quantity of cotton from different bale in order to know the quality of the cotton.
Purpose or objective of sampling
Two basic purposes of sampling are
- To obtain the maximum information about the population without examining each and every unit of the population.
- To find the reliability of the estimates derived from the sample, which can be done by computing standard error of the statistic.
Advantages of sampling over complete enumeration
- It is much cheaper method to collect the required information from sample as compare to complete enumeration as lesser units are studied in sample rather than population.
- From sample, the data can be collected more quickly and save time a lot.
- Planning for sample survey can be done more carefully and easily as compare to complete enumeration.
- Sampling is the only available method of collecting the required information when the population object/ subject or individual in population are of destructive nature.
- Sampling is the only available method of collecting the required information when the population is infinite or large enough.
- The most important advantage of sampling is that it provides reliability of the estimates.
- Sampling is extensively used to obtain some of the census information.
For further reading visit: Sampling Theory and Reasons to Sample
Research is inquiry. It is a process of discovering some new knowledge, that involves multiple elements such as theory development and testing, empirical inquiry, and sharing the generated knowledge with others such as experts and colleagues. A short description about elements of theory is:
Theory is a set of ideas and perceptions that helps people to understand complex concepts and the relationships among these concepts. To develop and/or test a theory, researchers conduct empirical inquiries, collect and analyze relevant data, and discuss the findings from empirical results. Once theories have been through the research process, it is necessary to share the results of the studies with others such as researchers (related to study) present papers at conferences and publish reports in journals and other publications.
The results of a study (research) may be used in two ways.
- The results may contribute to researchers’ general understanding of the topic they have researched i.e. studied and may contribute to, understanding how economy works, why price inflation happens, which factors increase a candidate’s chances of winning an election etc. The generalizations of results that researchers draw from their studies on these issues can be shared with other researchers and general public to advance society for the understanding of the topic.
- The results of a study may contribute to solving particular problems in a nation, state, or community. For example, a study on the healthcare needs of the elderly in a community may discover that their primary need is finding vehicles for transportation when they want to visit their doctors. The leaders of the community (such as mayor, city council) may use this information from healthcare study, to allocate some money for the transportation needs of the elderly in the next year’s budget.
Therefore, research is a tool which builds blocks of knowledge that in turn contribute to the development of science.
Why conduct research?
- We conduct research to understand a phenomenon, situation, or behavior under study.
- We conduct research to test existing theories and to develop new theory on the basis of existing ones.
- We conduct research to answer different questions of “how”, “what”, “which”, “when” and “why” about a phenomenon, behavior, or situation.
- Research related activities contribute to forming (making) new knowledge and expand the existing knowledge base.
Now a days one can collect/ gather information about almost anything from the Internet Just do a Google search, but a question is, does every Google search good research? Not quite! Do remember, though you will find some of the information, but it may or may not be valid or high-quality information. A lot of the information available on the Internet is good and useful, but some is not. There may be misinformation too on the Internet. The information you find on internet may be someones pure opinion, have some fabrication in it, or based on some unsystematic research or unauthentic information. In short, the information may be valid (objective, true).
Therefor, a high-quality research project
- is based on the scholarly work that has been already done by others in field,
- can be replicated/ reproduced,
- is generalization to other settings,
- is based on some logical rationale and tied to other existing theory;
- is doable can be done practically, i.e when deciding the scope of research, a researcher should consider availability of time and resources,
- generates some new questions,
- is incremental,
- is an apolitical (politically neutral) activity that should be undertaken for the betterment of society.
Two Types/Purposes of Research
Typically, there are two types/purposes of research: Basic Research and Applied Research
- To find out about truths regarding human behaviors, societies, economy, etc., or to understand them better. This type is called a basic research.
- To answer practical questions and support making informed decisions. This type called applied research.
Note that, most of the public administration and public policy research projects are of the second kind.