Home Blog Pre-Analysis Data Screening

Pre-Analysis Data Screening

Quantitative Results

You have finally collected the data that you wish to statistically examine. What are the preliminary steps you should take before jumping in and running inferential analyses? Typically, it is important to screen for missing responses and outlying responses.

Depending on the statistical analysis you are running, there are several ways to handle missing responses. Listwise deletion, also known as complete-case analysis, removes all associated data for a case that has one or more missing values. This method is most appropriate when running a longitudinal experimental study and the researcher wants to incorporate only the individuals who participated in the entire process (e.g., pretest AND posttest). In most other research designs, this is not the most optimal method. Pairwise deletion, also known as available-case analysis, utilizes as much available data as possible. For example, within a correlation analysis, data will be utilized wherever there are filled cells for an associated pair of variables. When running an advanced statistical technique such as Structural Equation Modeling (SEM), there is frequently a strict assumption that there can be no missing cells. In such a case, multiple imputation or median replacement of values are commonly utilized methods to fill in missing data.

Outliers are extreme values within a data set that have the potential to skew findings. Leaving only a few extreme values in a data set can drastically alter statistical findings. Therefore, it is important to eliminate such cases. One common method for pinpointing outliers is by standardizing the scores. Tabachnick and Fidell (2013) suggest removing values that fall outside of the range “+ 3.29 standard deviations” away from the mean. This method removes scores that are expected to make up only 0.10% of the data distribution (or in other words, very infrequent scores). A second common method is to examine a boxplot and remove any values that fall more than 1.5 times the length of the box from either end of the box (Quartile 1 and Quartile 3).

Reference:

Tabachnick, B. G., & Fidell, L. S. (2013). Using Multivariate Statistics, 6^th ed. Boston: Allyn and Bacon.