It is very important for the researcher to manage missing values efficiently. If the missing values are not handled properly by the researcher, then he/she may end up drawing an inaccurate inference about the data. Due to improper handling of missing values, the result obtained by the researcher will differ from ones where the missing values are present.
As a researcher, it is important to fully understand the concept of missing values.
Item nonresponse occurs when the respondent does not respond to certain questions due to stress, fatigue or lack of knowledge. Sometimes the respondent does not respond because some questions are sensitive. This leads to missing values.
Sometimes, while conducting some research of past records, the researcher comes across certain missing values in the record.
There is no fixed rule about the proper handling of missing values. The researcher may leave the data or do data imputation to replace the missing values. It entirely depends upon the researcher’s experience in dealing with missing values.
Suppose the number of cases of missing values is extremely small; then, an expert researcher may drop or omit those missing values from the analysis. In statistical language, if the number of the cases of missing values is less than 5% of the sample, then the researcher can drop the missing values.
In the case of multivariate analysis, if there is a larger number of missing values, then it can be better to drop those cases (rather than do data imputation) and replace the missing values. On the other hand, in univariate analysis, data imputation can decrease the amount of bias in the data, if there are missing values at random.
There are two forms of randomly missing values:
- MCAR
- MAR
The first form of randomly missing values is missing completely at random (MCAR). This form of randomly missing values exists when the missing values are randomly distributed across all observations. This form of missing values can be confirmed by partitioning the data into two parts: a data containing the missing values, and the data containing the non missing values. After partitioning the data, the most popular test, called the t-test of mean difference, is carried out in order to check whether there exists any difference in the sample between missing values and non-missing values.
The researcher should keep in mind that if the data are MCAR, then he may choose a pair-wise or a list-wise deletion of missing value cases. If, however, the data are not MCAR, then data imputation to replace the missing values is done.
The second form of randomly missing values is missing at random (MAR). In MAR, the missing values are not randomly distributed across observations but are distributed within one or more subsamples. This form of missing values is more common than the previous one.
The non-ignorable missingness is the most problematic form which involves those types of missing values that are not randomly distributed across the observations. In this case, the probability of missing values cannot be predicted from the variables in the model. This can be ignored by performing data imputation to replace the missing values.
There are estimation methods in SPSS that provide the researcher with certain statistical techniques to estimate the missing values. These are namely regression, maximum likelihood estimation, list-wise or pair-wise deletion, approximate Bayesian bootstrap, multiple data imputation, and many others.
Data Imputation and Missing Values Resources
Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage Publications.
Glas, C. A. W., & Pimental, J. L. (2008). Modeling nonignorable missing data in speeded tests. Educational and Psychological Measurement, 68(6), 907-922.
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549-576.
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8(3), 206-213.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: John Wiley & Sons.
Pickles, A. (2005). Missing data, problems and solutions. In Encyclopedia of Social Measurement (pp. 689-694). Amsterdam: Elsevier.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons.
Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434), 473-489.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: CRC Press.
Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8(1), 3-15.
Schafer, J. L., & Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33(4), 545-571.
van Ginkel, J. R., van der Ark, L. A., & Sijtsma, K. (2007). Multiple imputation for item scores when test data are factorial complex. British Journal of Mathematical and Statistical Psychology, 60, 315-337.


