Pearson Correlation Assumptions

Quantitative Results
Statistical Analysis

Understanding the Pearson Product-Moment Correlation: A Guide with Social Science Applications

The Pearson product-moment correlation coefficient (Pearson’s r) is a widely used statistical measure in the social sciences to assess the strength and direction of the linear relationship between two continuous variables. However, its effective application hinges on four critical assumptions: level of measurement, related pairs, absence of outliers, and linearity. Let’s delve into each assumption with examples from social science research to illuminate their importance.

1. Level of Measurement

Assumption: Both variables should be measured on a continuous scale.

Explanation: Continuous variables are those that can take on any value within a range. In social sciences, examples include age, income, or scores on a psychological scale.

Social Science Example: When studying the relationship between stress levels and job performance, both variables are continuous. Stress levels could be measured using a standardized psychological scale, and job performance could be quantified through performance ratings.

Alternative Approach: If one or both variables are ordinal (ranked data), Spearman’s rank-order correlation is a more appropriate choice. For instance, if you’re examining the relationship between educational level (ordinal) and income (continuous), Spearman’s correlation would be suitable.

2. Related Pairs

Assumption: Each observation must include pairs of values for the two variables.

Explanation: This means for each participant or unit of analysis in your study, you must have data for both variables being correlated.

Need help with your analysis?

Schedule a time to speak with an expert using the calendar below.

User-friendly Software

Transform raw data to written, interpreted , APA formatted correlation results in seconds.

Social Science Example: If exploring the correlation between educational attainment and health outcomes, each participant in your study needs to have both an educational attainment level and a health outcome measure recorded.

3. Absence of Outliers

Assumption: The data should not contain outliers in either variable.

Explanation: Outliers are extreme values that deviate significantly from the rest of the data. They can distort the correlation, misleadingly strengthening or weakening it.

Social Science Example: In a study on the relationship between age and cognitive function, an exceptionally young or old participant with an atypical cognitive score could be an outlier. Such outliers might skew the analysis, suggesting a stronger or weaker correlation than actually exists.

Identification: Outliers are often identified as values more than 3.29 standard deviations from the mean, though this criterion can vary based on the distribution of your data.

4. Linearity

Assumption: The relationship between the variables should be linear.

Explanation: This means that if you were to plot the data on a scatterplot, the overall pattern of the points should suggest a straight line rather than a curve.

Social Science Example: When investigating the correlation between hours of study and exam scores, a linear assumption implies that increases in study hours are consistently associated with increases (or decreases) in exam scores across the range of the data.

Non-Linearity: If the relationship appears curved, transformations on the data or alternative statistical methods might be necessary.

Take the Course: Pearson Correlation