Multicollinearity

Quantitative Results

The term multicollinearity was first used by Ragnar Frisch.  It describes a perfect or exact relationship between the regression exploratory variables. Linear regression analysis assumes that there is no perfect exact relationship among exploratory variables.  In regression analysis, when this assumption is violated, the problem of Multicollinearity occurs.

Statistics Solutions is the country’s leader in dissertation statistical consulting and can assist with your regression analysis.

request a consultation

Discover How We Assist to Edit Your Dissertation Chapters

Aligning theoretical framework, gathering articles, synthesizing gaps, articulating a clear methodology and data plan, and writing about the theoretical and practical implications of your research are part of our comprehensive dissertation editing services.

  • Bring dissertation editing expertise to chapters 1-5 in timely manner.
  • Track all changes, then work with you to bring about scholarly writing.
  • Ongoing support to address committee feedback, reducing revisions.

In regression analysis, multicollinearity has the following types:

1. None: When the regression exploratory variables have no relationship with each other, then there is no multicollinearity in the data.
2. Low: When there is a relationship among the exploratory variables, but it is very low, then it is a type of low multicollinearity.
3. Moderate: When the relationship among the exploratory variables is moderate, then it is said to be moderate multicollinearity.
4. High: When the relationship among the exploratory variables is high or there is perfect correlation among them, then it said to be high multicollinearity.
5. Very high: When the relationship among the exploratory variables is exact, then it is the problem of very high multicollinearity, which should be removed from the data when regression analysis is conducted.

Many Factors affect multicollinearity. For example, it may exist during the data collection process, or it may exist due to the wrong selection of the model. For example, if we take the exploratory variables to be income and house size in our model, then the model will have the problem of multicollinearity because income and house size are highly correlated. It may also occur if we take too many exploratory variables in regression analysis.

Consequences: If the data has a perfect or exact multicollinearity problem, then the following will be the impact of it:

1. In the presence of multicollinearity, variance and covariance will be wider, which will make it difficult to reach a statistical decision for the null and alternative hypothesis.
2. In the presence of multicollinearity, the confidence interval will be wider due to the wider confidence interval. In this case, we will accept the null hypothesis, which should be rejected.
3. In the presence of multicollinearity, the standard error will increase and it makes the value of the t-test smaller. We will accept the null hypothesis that should be rejected.
4. Multicollinearity will increase the R-square as well, which will impact the goodness of fit of the model.

Detection: The following are the methods that show the presence of multicollinearity:

1. In regression analysis, when R-square of the model is very high but there are very few significant t ratios, this shows multicollinearity in the data.
2. High correlation between exploratory variables also indicates the problem of multicollinearity.
3. Tolerance limit and variance inflating factor: In regression analysis, one-by-one minus correlation of the exploratory variable is called the variance inflating factor. As the correlation between the repressor variable increases, VIF also increases. More VIF shows the presence of multicollinearity. The inverse of VIF is called Tolerance. So the VIF and TOI have a direct connection.

Remedial measure: In regression analysis, the first step is to detect multicollinearity. If it is present in the data, then we can solve this problem by taking several steps. The first step is to drop the variable, which has the specification bias of multicollinearity. By combining the cross sectional data and the time series data, multicollinearity can be removed. If there is a high multicollinearity, then it can be removed by transforming the variable. By taking the first or the second, different variables can be transformed. By adding some new data, it can be removed. In multivariate analysis, by taking the common score of the multicollinearity variable, multicollinearity can be removed. In factor analysis, principle component analysis is used to drive the common score of multicollinearity variables. A rule of thumb to detect multicollinearity is that when the VIF is greater than 10, then there is a problem of multicollinearity.

Contact Statistics Solutions today for more information on multicollinearity.