Multicollinearity

Quantitative Results

The term multicollinearity was first used by Ragnar Frisch.  It describes a perfect or exact relationship between the regression exploratory variables. Linear regression analysis assumes that there is no perfect exact relationship among exploratory variables.  In regression analysis, when this assumption is violated, the problem of multicollinearity occurs.

Type of multicollinearity in regression analysis:

1. None: When the regression exploratory variables have no relationship with each other.
2. Low: When there is a relationship among the exploratory variables, but it is very low.
3. Moderate: When the relationship among the exploratory variables is moderate.
4. High: When the relationship among the exploratory variables is high or there is perfect correlation among them.
5. Very high: When the relationship among the exploratory variables is exact, then it is the problem of very high multicollinearity, which should be removed from the data when regression analysis is conducted.

Many Factors affect multicollinearity. For example, it may exist during the data collection process, or it may exist due to the wrong selection of the model. For example, if we take the exploratory variables to be income and house size in our model, then the model will have the problem of multicollinearity because income and house size are highly correlated. It may also occur if we take too many exploratory variables in regression analysis.

Need help with your analysis?

Schedule a time to speak with an expert using the calendar below.

Consequences

If the data has a perfect or exact multicollinearity problem, then the following will be the impact of it:

1. In the presence of multicollinearity, variance and covariance will be wider, which will make it difficult to reach a statistical decision for the null and alternative hypothesis.
2. In the presence of multicollinearity, the confidence interval will be wider due to the wider confidence interval. In this case, we will accept the null hypothesis, which should be rejected.
3. In the presence of multicollinearity, the standard error will increase and it makes the value of the t-test smaller. We will accept the null hypothesis that should be rejected.
4. Multicollinearity will increase the R-square as well, which will impact the goodness of fit of the model.

Detection

1. In regression analysis, when R-square of the model is very high but there are very few significant t ratios, this shows multicollinearity in the data.
2. High correlation between exploratory variables also indicates the problem of multicollinearity.
3. Tolerance limit and variance inflating factor: In regression analysis, one-by-one minus correlation of the exploratory variable is called the variance inflating factor. As the correlation between the repressor variable increases, VIF also increases. More VIF shows the presence of multicollinearity. The inverse of VIF is called Tolerance. So the VIF and TOI have a direct connection.

Remedial measure

In regression analysis, the first step is to detect multicollinearity. If it is present in the data, then we can solve this problem by taking several steps. The first step is to drop the variable, which has the specification bias of multicollinearity. By combining the cross sectional data and the time series data, multicollinearity can be removed. If there is a high multicollinearity, then it can be removed by transforming the variable. By taking the first or the second, different variables can be transformed. By adding some new data, it can be removed. In multivariate analysis, by taking the common score of the multicollinearity variable, multicollinearity can be removed. In factor analysis, principle component analysis is used to drive the common score of multicollinearity variables. A rule of thumb to for detection is that when the VIF is greater than 10, then there is a problem.

Take the Course: Linear Regression