Missing Values in Data
The concept of missing values is important to understand in order to successfully manage data. If the missing values are not handled properly by the researcher, then he/she may end up drawing an inaccurate inference about the data. Due to improper handling, the result obtained by the researcher will differ from ones where the missing values are present.
Item non-response occurs when the respondent does not respond to certain questions due to stress, fatigue or lack of knowledge. The respondent may not respond because some questions are sensitive. These lack of answers would be considered missing values.
Handling Missing Values
The researcher may leave the data or do data imputation to replace the them. Suppose the number of cases of missing values is extremely small; then, an expert researcher may drop or omit those values from the analysis. In statistical language, if the number of the cases is less than 5% of the sample, then the researcher can drop them.
In the case of multivariate analysis, if there is a larger number of missing values, then it can be better to drop those cases (rather than do imputation) and replace them. On the other hand, in univariate analysis, imputation can decrease the amount of bias in the data, if the values are missing at random.
There are two forms of randomly missing values:
- MCAR: Missing completely at random
- MAR: Missing at random
The first form is missing completely at random (MCAR). This form exists when the missing values are randomly distributed across all observations. This form can be confirmed by partitioning the data into two parts: one set containing the missing values, and the other containing the non missing values. After partitioning the data, the most popular test, called the t-test of mean difference, is carried out in order to check whether there exists any difference in the sample between the two data-sets.
The researcher should keep in mind that if the data are MCAR, then he may choose a pair-wise or a list-wise deletion of missing value cases. If, however, the data are not MCAR, then imputation to replace them is conducted.
The second form is missing at random (MAR). In MAR, the missing values are not randomly distributed across observations but are distributed within one or more sub-samples. This form is more common than the previous one.
The non-ignorable missing value is the most problematic form which involves those types of missing values that are not randomly distributed across the observations. In this case, the probability cannot be predicted from the variables in the model. This can be ignored by performing data imputation to replace them.
There are estimation methods in SPSS that provide the researcher with certain statistical techniques to estimate the missing values. These are namely regression, maximum likelihood estimation, list-wise or pair-wise deletion, approximate Bayesian bootstrap, multiple data imputation, and many others.
Additional Webpages Related to Autocorrelation
Data Imputation and Missing Values Resources
Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage Publications.
Glas, C. A. W., & Pimental, J. L. (2008). Modeling nonignorable missing data in speeded tests. Educational and Psychological Measurement, 68(6), 907-922.
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549-576.
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8(3), 206-213.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: John Wiley & Sons.
Pickles, A. (2005). Missing data, problems and solutions. In Encyclopedia of Social Measurement (pp. 689-694). Amsterdam: Elsevier.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons.
Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434), 473-489.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: CRC Press.
Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8(1), 3-15.
Schafer, J. L., & Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33(4), 545-571.
van Ginkel, J. R., van der Ark, L. A., & Sijtsma, K. (2007). Multiple imputation for item scores when test data are factorial complex. British Journal of Mathematical and Statistical Psychology, 60, 315-337.