Author Archive
The ZScore
The Zscore also referred to as standardized raw scores is a useful statistic because not only permits to compute the probability (chances or likelihood) of raw score (occurring within normal distribution) but also it helps to compare two raw scores from different normal distributions. The Zscore is a dimensionless measure since it is derived by subtracting the population mean from an individual raw score and then this difference is divided by the population standard deviation. This computational procedure is called standardizing raw score, which is often used in the Ztest of testing of hypothesis.
Any raw score can be converted to a Zscore by
Example 1:
If the mean = 100 and standard deviation = 10, what would be the Zscore of the following raw score
Raw Score 
ZScore 
90 

110 

70 

100 

Note that:
 If Zscore have a zero value then it means that raw score is equal to the population mean.
 If Zscore has positive value then it means that raw score is above the population mean.
 If Zscore has negative value then it means that raw score is below the population mean.
Example 2:
Suppose you got 80 marks in Exam of a class and 70 marks in another exam of that class. You are interested in finding that in which exam you have performed better. Also suppose that the mean and standard deviation of exam1 are 90 and 10 and in exam2 60 and 5 respectively. Converting both exam marks (raw scores) into standard score (ZScore), we get
The Zscore results () shows that 80 marks are one standard deviation below the class mean.
The Zscore results () shows that 70 marks are two standard deviation above the mean.
From Z1 and Z2 means that in second exam student performed well as compared to the first exam. Another way to interpret the Zscore of is that about 34.13% of the students got marks below the class average. Similarly the Zscore of 2 implies that 47.42% of the students got marks above the class average.
Like this:
Like Loading...
Multicollinearity in Linear Regression Models
The objective of multiple regression analysis is to approximate the relationship of individual parameters of a dependency, but not of interdependency. It is assumed that the dependent variable and regressors ‘s are linearly related to each other (Graybill, 1980; Johnston, 1963 and Malinvaud, 1968). Therefore, inferences depicted from any regression model are
(i) identify relative influence of regressors
(ii) prediction and/or estimation and
(iii) selection of appropriate set of regressors for the model.
From all these inferences, one of the purpose of regression model is to ascertain what extent the dependent variable can be predicted by the regressors in the model. However, to draw some suitable inferences, the regressors should be orthogonal, i.e., there should be no linear dependencies among regressors. However, in most of the application of regression analysis, regressors are not orthogonal, which leads to misleading and erroneous inferences, especially, in case when regressors are perfectly or nearly perfectly collinear to each other. The condition of nonorthogonality is also referred to as the problem of multicollinearity or collinear data for example, see Gunst and Mason, 1977; Mason et al., 1975 and Ragnar, 1934). Multicollinearity is also synonymous with illconditioning of matrix.
The presence of interdependence or the lack of independence is signified by high order intercorrelation () within a set of regressors ({Dorsett et al, 1983; Farrar and Glauber1967; Gunst and Mason, 1977; Mason et al., 1975). The perfect multicollinearity situation is a pathological extreme and it can easily be detected and resolved by dropping one of the regressor(s) causing multicollinearity (Belsley et al., 1980). In case of perfect multicollinearity, the regression coefficients remain indeterminate and their standard errors are infinite. Similarly, perfectly collinear regressors destroy the uniqueness of the least square estimators (Belsley et al., 1980 and Belsley, 1991). Many explanatory variables (regressors/ predictors) are highly collinear, making very difficult to infer the separate influence of collinear regressors on the response variable (), that is, estimation of regression coefficients becomes difficult because coefficient(s) measures the effect of the corresponding regressor while holding all other regressors as constant. Problem of not perfect multicollinearity is extremely hard to detect (Chatterjee and Hadi, 2006) as it is not specification or modeling error, actually it is a condition of deficit data (Hadi and Chatterjee, 1988). On the other hand, existence of multicollinearity has no impact on the overall regression model and associated statistics such as , ratio and value. Multicollinearity does not also lessen the predictive or reliability of the regression model as whole, it only affects the individual regressors (Koutsoyiannis, 1977). Note that, multicollinearity refers only to linear relationship among the regressors, it does not rule out nonlinear relationship among them.
To draw suitable inferences from the model, existence of (multi)collinearity should always be tested when examining a data set as an initial step in multiple regression analysis. On the other hand, high collinearity is rare, but some degree of collinearity is always exists.
A distinction between collinearity and multicollinearity should be made. Strictly speaking, multicollinearity is usually refers to the existence of more than one exact linear relationship among regressors, while collinearity refers to the existence of a single linear relationship. However, multicollinearity refers to both of the cases nowadays.
There are many methods for the detection/ testing of multi(collinearity) among regressors. However, these methods can destroy the usefulness of the model, since relevant regressor(s) may be removed by these methods. Note that, if there are two predictors then it is sufficient to detect problem of collinearity using pairwise correlation. However, to check the severity of the collinearity problem, VIF/TOL, eigenvalues or other diagnostic measures can be used.
For further detail see
 Belsley, D., Kuh, E., and Welsch, R. (1980). Diagnostics: Identifying Influential Data and Sources of Collinearity. John Willey & Sons, New York. chap. 3.
 Belsley, D. A. (1991). A Guide to Using the Collinearity Diagnostics. Computer Science in Economics and Management, 4(1), 3350.
 Chatterjee, S. and Hadi, A. S. (2006). Regression Analysis by Example. Wiley and Sons, 4th edition.
 Dorsett, D., Gunst, R. F., and Gartland, E. C. J. (1983). Multicollinear Effects of Weighted Least Squares Regression. Statistics & Probability Letters, 1(4), 207211.
 Graybill, F. (1980). An Introduction to Linear Statistical Models. McGraw Hill.
 Gunst, R. and Mason, R. (1977). Advantages of examining multicollinearities in regression analysis. Biometrics, 33, 249260.
 Hadi, A. and Chatterjee, S. (1988). Sensitivity Analysis in Linear Regression. John Willey & Sons.
 Imdadullah, M., Aslam, M. and Altaf, S. (2916) mctest: An R Package for Detection of Collinearity Among Regressors
 Imdadullah, M., Aslam, M. (2016). mctest: An R Package for Detection of Collinearity Among Regressors
 Johnston, J. (1963). Econometric Methods. McGraw Hill, New York.
 Koutsoyiannis, A. (1977). Theory of Econometrics. Macmillan Education Limited.
 Malinvaud, E. (1968). Statistical Methods of Econometrics. Amsterdam, North Holland. pp. 187192.
 Mason, R., Gunst, R., and Webster, J. (1975). Regression Analysis and Problems of Multicollinearity. Communications in Statistics, 4(3), 277292.
 Ragnar, F. (1934). Statistical Consequence Analysis by means of complete regression systems. Universitetets Ãkonomiske Instituut. Publ. No. 5.
Like this:
Like Loading...
Levels of Measurement (Scale of Measure)
Level of measurement (scale of measure) have been classified into four categories. It is important to understand these level of measurement, since these level of measurement play important part in determining the arithmetic and different possible statistical tests that are carried on the data. The scale of measure is a classification that describes the nature of information within the number assigned to variable. In simple words, the level of measurement determines how data should be summarized and presented. It also indicate the type of statistical analysis that can be performed. The four level of measurement are described below:
1) Nominal Level of Measurement (Nominal Scale)
In nominal level of measurement, the numbers are used to classify the data (unordered group) into mutually exclusive categories. In other words, for nominal level of measurement, observations of a qualitative variable are measured and recorded as labels or names.
2) Ordinal Level of Measurement (Ordinal Scale)
In ordinal level of measurement, the numbers are used to classify the data (ordered group) into mutually exclusive categories. However, it does not allow for relative degree of difference between them. In other words, for ordinal level of measurement, observations of a qualitative variable are either ranked or rated on a relative scale and recorded as labels or names.
3) Interval Level of Measurement (Interval Scale)
For data recorded at the interval level of measurement, the interval or the distance between values is meaningful. The interval scale is based on a scale with a known unit of measurement.
4) Ratio Level of Measurement (Ratio Scale)
Data recorded at the ratio level of measurement are based on a scale with a know unit of measurement and a meaningful interpretation of zero on the scale. Almost all quantitative variables are recorded on the ratio level of measurement.
Examples of level of measurements
Examples of Nominal Level of Measurement
 Religion (Muslim, Hindu, Christian, Buddhist)
 Race (Hispanic, African, Asian)
 Language (Urdu, English, French, Punjabi, Arabic)
 Gender (Male, Female)
 Marital Status (Married, Single, Divorced)
 Number plates on Cars/ Models of Cars (Toyota, Mehran)
 Parts of Speech (Noun, Verb, Article, Pronoun)
Examples of Ordinal Level of Measurement
 Rankings (1st, 2nd, 3rd)
 Marks Grades (A, B, C, D)
 Evaluation such as High, Medium, Low
 Educational level (Elementary School, High School, College, University)
 Movie Ratings (1 star, 2 stars, 3 stars, 4 stars, 5 stars)
 Pain Ratings (more, less, no)
 Cancer Stages (Stage 1, Stage 2, Stage 3)
 Hypertension Categories (Mild, Moderate, Severe)
Examples of Interval Level of Measurement
 Temperature with Celsius scale/ Fahrenheit scale
 Level of happiness rated from 1 to 10
 Education (in years)
 Standardized tests of psychological, sociological and educational discipline use interval scales.
 SAT scores
Examples of Ratio Level of Measurement
 Height
 Weight
 Age
 Length
 Volume
 Number of home computers
 Salary
For further details visit: Level of measurements
Like this:
Like Loading...
Variance is a measure of dispersion of a distribution of a random variable. The term variance was introduced by R. A. Fisher in 1918. The variance of a set of observations (data set) is defined as the mean of the squares of deviations of all the observations from their mean. When it is computed for entire population, the variance is called the population variance, usually denoted by , while for sample data, it is called sample variance and denoted by in order to distinguish between population variance and sample variance. Variance is also denoted by when we speak about the variance of a random variable. The symbolic definition for population and sample variance is
It should be noted that the variance is in square of units in which the observations are expressed and the variance is a large number compared to observations themselves. The variance because of its some nice mathematical properties, assumes an extremely important role in statistical theory.
Variance can be computed if we have standard deviation as variance is square of standard deviation i.e. .
Variance can be used to compare dispersion in two or more set of observations. Variance can never be negative since every term in the variance is squared quantity, either positive or zero.
To calculate the standard deviation one have to follow these steps:
 First find the mean of the data.
 Take difference of each observation from mean of the given data set. The sum of these differences should be zero or near to zero it may be due to rounding of numbers.
 Square the values obtained in step 1, which should be greater than or equal to zero, i.e. should be a positive quantity.
 Sum all the squared quantities obtained in step 2. We call it sum of squares of differences.
 Divide this sum of squares of differences by total number of observation if we have to calculate population standard deviation (). For sample standard deviation (S) divide the sum of squares of differences by total number of observation minus one i.e. degree of freedom.
Find the square root of the quantity obtained in step 4. The resultant quantity will be standard deviation for given data set.
The major characteristics of the variances are:
a) All of the observations are used in the calculations
b) Variance is not unduly influenced by extreme observations
c) The variance is not in same units as the observation, the variance is in square of units in which the observations are expressed.
Like this:
Like Loading...
A research can be classified into two groups: Qualitative and Quantitative Research
 Qualitative Research
Qualitative research involves collecting data from indept interviews, observations, field notes, and openended questions in questionnaire etc. The researcher himself is the primary data collection instrument and the data could be collected in form of words, images, and patterns etc.For Qualitative Research, data Analysis involves searching for patterns, themes and holistic features. Results of such research are likely to be context specific and reporting takes the form of a narrative with contextual description and direct quotations from researchers.
 Quantitative Research
Quantitative research involves collecting quantitative data based on precise measurement using some structured, reliable and validated collection instruments (questionnaire) or through archival data sources. The nature of quantitative data is in the form of variables and its data analysis involves establishing statistical relationship. If properly done, results of such research are generalize able to entire population.Quantitative research could be classified into two groups depending on the data collection methodologies
 Experimental Research
The main purpose of experimental research is to establish a cause and effect relationship. The defining characteristics of experimental research are active manipulation of independent variables and the random assignment of participants to the conditions to be manipulated, everything else should be kept as similar and as constant as possible.To depict the way experiments are conducted, a term used is called design of experiment. There are two main types of experimental design.
 Between Subjects Design
In within subject design, the same group of subjects serves in more than one treatment
 In between group design, two or more groups of subjects, each of which being tested by a different testing factor simultaneously.
 NonExperimental Research
NonExperimental Research is commonly used in sociology, political science and management disciplines. This kind of research is often done with the help of a survey. There is no random assignment of participants to a particular group nor do we manipulate the independent variables. As a result one cannot establish a cause and effect relationship through nonexperimental research. There are two approaches to analyzing such data
 Tests for approaches to analyzing such data such as IQ level of participants from different ethnic background.
 Tests for significant association between two factors such as firm sales and advertising expenditure.
Like this:
Like Loading...
Chisquare test is a nonparametric test. The assumption of normal distribution in the population is not required for this test. The statistical technique chisquare can be used to find the association (dependencies) between sets of two or more categorical variables by comparing how close the observed frequencies are to the expected frequencies. In other words, a chi square () statistic is used to investigate whether the distributions of categorical variables different from one another. Note that the response of categorical variables should be independent from each other. We use the chisquare test for relationship between two nominal scaled variables.
Chisquare test of independence is used as tests of goodness of fit and as tests of independence. In test of goodness of fit, we check whether or not observed frequency distribution is different from the theoretical distribution, while in test of independence we assess, whether paired observations on two variables are independent from each other (from contingency table).
Example: A social scientist sampled 140 people and classified them according to income level and whether or not they played a state lottery in the last month. The sample information is reported below. Is it reasonable to conclude that playing the lottery is related to income level? Use the 0.05 significance level.

Income 
Low 
Middle 
High 
Total 
Played 
46 
28 
21 
95 
Did not play 
14 
12 
19 
45 
Total 
60 
40 
40 
140 
Step by step procedure of testing of hypothesis about association between these two variable is described, below.
Step1:
: There is no relationship between income and whether the person played the lottery.
: There is relationship between income and whether the person played the lottery.
Step2: Level of Significance 0.05
Step 3: Test statistics (calculations)
Observed Frequencies () 
Expected Frequencies () 

46 
95*60/140= 40.71 

28 
95*40/140= 27.14 

21 
95*40/140= 27.14 

14 
45*60/140= 19.29 

12 
45*40/140= 12.86 

19 
45*40/140= 12.86 


6.544 
Step 4: Critical Region:
Tabular ChiSquare value at 0.05 level of significance and is 5.991.
Step 5: Decision
As calculated ChiSquare value is greater than tabular ChiSquare value, we reject , which means that there is relationship between income level and playing the lottery.
Note that there are several types of chisquare test (such as Yates, Likelihood ratio, Portmanteau test in time series) available which depends on the way data was collected and also the hypothesis being tested.
Like this:
Like Loading...
Mean: Measure of Central Tendency
The measure of Central Tendency Mean (also know as average or arithmetic mean) is used to describe the data set as a single number (value) which represents the middle (center) of the data, that is average measure (performance, behaviour, etc) of data. This measure of central tendency is also known as measure of central location or measure of center.
Mathematically mean can be defined as the sum of the all values in a given dataset divided by the number of observations in that data set under consideration. The mean is also called arithmetic mean or simply average.
Example: Consider the following data set consists of marks of 15 student in certain examination.
50, 55, 65, 43, 78, 20, 100, 5, 90, 23, 40, 56, 70, 88, 30
The mean of above data values is computed by adding all these values (50 + 55 + 65 + 43 + 78 + 20 + 100 + 5 + 90 + 23 + 40 + 56 + 70 + 88 + 30 = 813) and then dividing by the number of observations added (15) which equals 54.2 marks, that is
The above procedure of calculating the mean can be represented mathematically
The Greek symbol (pronounced “mu”) is the representation of population mean in statistics and is the number of observations in the population data set.
The above formula is known as population mean as it is computed for whole population. The sample mean can also be computed in same manner as population mean is computed. Only the difference is in representation of the formula, that is,
.
The is representation of sample mean and shows number of observations in the sample.
The mean is used for numeric data only. Statistically the data type for calculating mean should be Quantitative (variables should be measured on either ratio or interval scale), therefore, the numbers in data set can be continuous and/ or discrete in nature.
Note that mean should not be computed for alphabetic or categorical data (data should not belong to nominal or ordinal scale). Mean is influenced by very extreme values in data, i.e. very large or very small values in data changes the mean drastically.
For other measures of central tendencies visit: Measures of Central Tencencies
Like this:
Like Loading...
Probability sampling
In probability each unit of the population has known (nonzero) probability of being included in the sample and samples are selected randomly by using some random selection method. That’s why, probability sampling may also be called random sampling. In probability sampling reliability of the estimates can be determined. In probability sampling, samples are selected without any interest. The advantage of probability sampling is that it provides a valid estimates of sampling error. Probability sampling is widely used in various areas such as industry, agriculture, business sciences, etc.
Important types of probability sampling are
 Simple Random Sampling
 Stratified random sampling
 Systematic sampling
 Cluster sampling
Nonprobability sampling
In nonprobability sampling samples are selected by personal judgement due to this personal judgement in selection of sample bias may include which makes the result unrepresentative. Nonprobability sampling may also be called as nonrandom sampling. The disadvantage of nonprobability is that the reliability of the estimates cannot be determined.
Types of nonprobability sampling are
 Purposive sampling
 Quota sampling
 Judgement sampling
 Snowball sampling
 Convenience sampling
Differences between Probability and NonProbability Sampling
The difference between nonprobability and probability sampling is that nonprobability sampling does not involve random selection of object while in probability sampling objects are selected by using some random selection method. In other words it means that nonprobability samples aren’t representative of the population, but it is not necessary. But it may means that nonprobability samples cannot depend upon the rationale of probability theory.
In general, researchers may prefer probabilistic or random sampling methods over a nonprobabilistic sampling method, and consider them to be more accurate and rigorous. However, in applied social sciences, for researchers there may be circumstances where it is not possible to obtain sampling using some probability sampling methods. Even practical or theoretically it may not be sensible to do random sampling. Therefore a wide range of nonprobability sampling methods may be considered, in these circumstances.
Like this:
Like Loading...
It is often required to collect information from the data. There two methods for collecting the required information.
 Complete information
 Sampling
Complete Information
In this method the required information are collected from each and every individual of the population. This method is used when it is difficult to draw some conclusion (inference) about the population on the basis of sample information. This method is costly and time consuming. This method of getting data also called Complete Enumeration or Population Census.
Sampling
Sampling is the most commonly and wisely used method of collecting the information. In this method instead of studying the whole population only a small part of population is selected and studied and result is applied to the whole population. For example, a cotton dealer picked up a small quantity of cotton from different bale in order to know the quality of the cotton.
Purpose or objective of sampling
Two basic purposes of sampling are
 To obtain the maximum information about the population without examining each and every unit of the population.
 To find the reliability of the estimates derived from the sample, which can be done by computing standard error of the statistic.
Advantages of sampling over complete enumeration
 It is much cheaper method to collect the required information from sample as compare to complete enumeration as lesser units are studied in sample rather than population.
 From sample, the data can be collected more quickly and save time a lot.
 Planning for sample survey can be done more carefully and easily as compare to complete enumeration.
 Sampling is the only available method of collecting the required information when the population object/ subject or individual in population are of destructive nature.
 Sampling is the only available method of collecting the required information when the population is infinite or large enough.
 The most important advantage of sampling is that it provides reliability of the estimates.
 Sampling is extensively used to obtain some of the census information.
For further reading visit: Sampling Theory and Reasons to Sample
Like this:
Like Loading...
Research is inquiry. It is a process of discovering some new knowledge, that involves multiple elements such as theory development and testing, empirical inquiry, and sharing the generated knowledge with others such as experts and colleagues. A short description about elements of theory is:
Theory is a set of ideas and perceptions that helps people to understand complex concepts and the relationships among these concepts. To develop and/or test a theory, researchers conduct empirical inquiries, collect and analyze relevant data, and discuss the findings from empirical results. Once theories have been through the research process, it is necessary to share the results of the studies with others such as researchers (related to study) present papers at conferences and publish reports in journals and other publications.
The results of a study (research) may be used in two ways.
 The results may contribute to researchers’ general understanding of the topic they have researched i.e. studied and may contribute to, understanding how economy works, why price inflation happens, which factors increase a candidate’s chances of winning an election etc. The generalizations of results that researchers draw from their studies on these issues can be shared with other researchers and general public to advance society for the understanding of the topic.
 The results of a study may contribute to solving particular problems in a nation, state, or community. For example, a study on the healthcare needs of the elderly in a community may discover that their primary need is finding vehicles for transportation when they want to visit their doctors. The leaders of the community (such as mayor, city council) may use this information from healthcare study, to allocate some money for the transportation needs of the elderly in the next year’s budget.
Therefore, research is a tool which builds blocks of knowledge that in turn contribute to the development of science.
Why conduct research?
 We conduct research to understand a phenomenon, situation, or behavior under study.
 We conduct research to test existing theories and to develop new theory on the basis of existing ones.
 We conduct research to answer different questions of “how”, “what”, “which”, “when” and “why” about a phenomenon, behavior, or situation.
 Research related activities contribute to forming (making) new knowledge and expand the existing knowledge base.
HighQuality Research
Now a days one can collect/ gather information about almost anything from the Internet Just do a Google search, but a question is, does every Google search good research? Not quite! Do remember, though you will find some of the information, but it may or may not be valid or highquality information. A lot of the information available on the Internet is good and useful, but some is not. There may be misinformation too on the Internet. The information you find on internet may be someones pure opinion, have some fabrication in it, or based on some unsystematic research or unauthentic information. In short, the information may be valid (objective, true).
Therefor, a highquality research project
 is based on the scholarly work that has been already done by others in field,
 can be replicated/ reproduced,
 is generalization to other settings,
 is based on some logical rationale and tied to other existing theory;
 is doable can be done practically, i.e when deciding the scope of research, a researcher should consider availability of time and resources,
 generates some new questions,
 is incremental,
 is an apolitical (politically neutral) activity that should be undertaken for the betterment of society.
Two Types/Purposes of Research
Typically, there are two types/purposes of research: Basic Research and Applied Research
 To find out about truths regarding human behaviors, societies, economy, etc., or to understand them better. This type is called a basic research.
 To answer practical questions and support making informed decisions. This type called applied research.
Note that, most of the public administration and public policy research projects are of the second kind.
Like this:
Like Loading...