# Data Analysis Plan

Research has several distinctive stages: four of them include the research design, data analysis plan, the statistical analysis, and the reporting of the analysis. This page will discuss the data analysis planning part of research, which is distinguished from the actual statistical analysis.

The data analysis plan refers to determining how the data will be cleaned, transformed, and analyzed.

**Cleaning the Data**

The cleaning of data is the removing of univariate and multivariate outliers, dealing with missing data, and assessing for normality.

**Univariate outliers **

Univariate outlier refers to an observation with a standard deviation of greater than ±3.29 from the mean. This is easily accomplished by standardizing the scores of a variable (i.e., the variable’s scores have a mean of zero and a standard deviation of 1), and looking for an observation greater than ±3.29.

**Multivariate outliers**

Multivariate outliers refer to outliers on a combination of two or more variables. To assess for multivariate outliers, you can conduct a regression with the observation ID number as the dependent variable, the variables being assessed as the predictors, and assess for Mahalanobis' distance. Then examine an observation’s Mahalanobis' distance score relative to the degrees of freedom (i.e., the number of variables will equal the degrees of freedom) for a chi-square value at the p=.001 level.

**Missing data**

Missing data is the absence of an observation on a variable. There are a few remedies: drop the observation with the missing data, mean substitution, and multiple imputation (using SPSS or EQS).

**Normality**

Normality refers to the shape of the distribution of scores (e.g., shape of a normal bell curve). To assess for normality, a researcher can examine skewness and kurtosis of a variable, or conduct a 1-sample KS test. The KS test will report whether the distribution of data is significantly different than a normal curve.

**Transforming the Data**

Many multivariate tests assume normality. When the data is not normally distributed a transformation of the data can be appropriate. Some common transformations are the square root, logarithmic, and inverse.

**Analyzing the Data **

The selection of the analysis is based on two things: the way the hypothesis is stated in statistical language and the level of measurement of the variable.

**The Hypothesis **

The way the researcher states the hypothesis makes a difference in the data analysis. Here are three null hypothesis examples: (1) Variable A does not relate to Variable B, (2) Variable A does not predict to Variable B, (3) There are no differences on Variable A by Variable B. (1) tends to be stated in correlation or chi-square language, (2) in regression language, and (3) in ANOVA or perhaps Mann-Whitney language. How is one to choose the precise data analysis? It depends on the level of measurement of each of the variables A and B.

**Level of Measurement of the Variables to Select the Correct Data Analysis**

In the hypotheses above, the level of measurement of the variables is a key factor in selecting the correct data analysis. In example (1) if the variables are both categorical the correct analysis would be a chi-square test, while if both variables are interval-level, a Pearson correlation would be the correct analysis. In example (2), regression is the appropriate test (i.e., examining the influence of a variable on another variable), linear regression is the correct analysis if the dependent variable is interval-level, logistic regression if the dependent variable is dichotomous, and multinominal logistic regression if the variable has three or more categories. In example (3) if the dependent variable is interval an ANOVA may be appropriate while an ordinal dependent variable a Mann-Whitney may be the analysis.

**Putting the Data Analysis All Together**

In the data analysis plan, data cleaning and transformation should be addressed, then discuss the data analysis of the data. Be sure to state the hypotheses the way you want—to examine relationships, to predict, or to examine differences on a variable by another variable.

Statistics Solutions can assist with the development of your quantitative or qualitative data analysis plan. We offer the following services:

*Data Analysis Plan*

- Edit your research questions and null/alternative hypotheses
- Edit data analysis plan; specify specific statistics to address the research questions, the assumptions of the statistics, and justify why they are the appropriate statistics; provide references
- Justify your sample size/power analysis, provide references
- Explain your data analysis plan to you so you are comfortable and confident

**Please call 877-437-8622 to request a quote based on the specifics of your research, or email Info@StatisticsSolutions.com.*