Multiple Imputation for Missing Data

Multiple imputation for missing data is an attractive method for handling missing data in multivariate analysis. The idea of multiple imputation for missing data was first proposed by Rubin (1977).

Procedure

The following is the procedure for conducting the multiple imputation for missing data that was created by Rubin in 1987:

  • The first step of multiple imputation for missing data is to impute the missing values by using an appropriate model which incorporates random variation.
  • The second step of multiple imputation for missing data is to repeat the first step 3-5 times.
  • The third step of multiple imputation for missing data is to perform the desired analysis on each data set by using standard, complete data methods.
  • The fourth step of multiple imputation for missing data is to average the values of the parameter estimates across the missing value samples in order to obtain a single point estimate.
  • The fifth step of multiple imputation for missing data is to calculate the standard errors by averaging the squared standard errors of the missing value estimates. After this, the researcher must calculate the variance of the missing value parameter across the samples. Finally, the researcher must combine the two quantities in multiple imputation for missing data to calculate the standard errors.

Features

Multiple imputation for missing data has several desirable features:

  • Multiple imputation for missing data makes it possible for the researcher to obtain approximately unbiased estimates of all the parameters from the random error. The researcher cannot achieve this result from deterministic imputation, which the multiple imputation for missing data can do.
  • This multiple imputation for missing data allows the researcher to obtain good estimates of the standard errors. The multiple imputation for missing data is unlike single imputation, since it doesn’t allow additional error to be introduced by the researcher.
  • The researcher can perform multiple imputation for missing data with any kind of data in any kind of analysis, without well-equipped software.

However, there are certain conditions that should be satisfied before performing multiple imputation for missing data.

Conditions

Conditions that should be satisfied before performing multiple imputation for missing data:

  • The first condition for the multiple imputation for missing data is that the data should be missing at random. In other words, the first condition for the multiple imputation for missing data states that the probability of the missing data on a particular variable can depend on other observed variables, but cannot depend on itself.
  • The second condition for the multiple imputation for missing data is that the model that is used by the researcher to impute the values should be appropriate.
  • The third condition for the multiple imputation for missing data is that model that is used by the researcher should match with the other model that is used for the multiple imputation for missing data.

However, the problem is that it is quite easy for the researcher to violate such conditions while performing multiple imputation for missing data. This is because there are cases of multiple imputation for missing data where the data is not missing at random.

In order to solve this problem, the researcher estimates the model for the data that is not missing at random. But such models are complex and untestable, and they therefore require some well equipped software to perform.

Another thing the researcher should keep in mind is that if ‘missing at random’ is satisfied, then the unbiased estimates obtained by multiple imputation for missing data are not always easy to interpret.

Related Pages: