What is Point-Biserial Correlation?
Like all correlation analyses the Point-Biserial Correlation measures the strength of association or co-occurrence between two variables. Correlation analyses express this strength of association in a single value, the correlation coefficient.
Point-Biserial Correlation Analysis in SPSS
We’re exploring the association between passing the exam and scores in math, reading, and writing. To work correctly, we need to define the measurement level for the variables in the variable view. SPSS calculates the Point-Biserial Correlation by using Pearson’s Bivariate Correlation Coefficient.
Before using Pearson r as the Point-Biserial Correlation, we should test for a relationship between the variables. The first step, as described in the Pearson’s Bivariate Correlation section, is to draw the scatter plot of both variables. For the Point-Biserial Correlation Coefficient this diagram would look like this.
The diagram shows a positive relationship between math score and exam outcome. Since the exam is nominal, a box plot is a better display. To create a box plot we select Graphs/Chart Builder… and select the Simple Box plot from the List in the Gallery. Drag Exam on the x-axis and Math Test on the y-axis.
A box plot displays the distribution information of a variable. More specifically it displays the quartiles of the data. The whiskers of the box span from the 0% quartile to the 100% quartile of the data. If the sample contains outliers, the data points appear outside the whiskers. The box spans the 25th to 75th quartile, with the median as a line inside.
The box plot shows that students who passed the final exam have higher math scores on average, with almost no overlap between the two groups. Now that we have an understanding of the direction of our association between the two variables we can conduct the Point-Biserial Correlation Analysis.
SPSS does not have a special procedure for the Point-Biserial Correlation Analysis. To calculate a Point-Biserial Correlation in SPSS, you must use the Pearson’s r procedure. Therefore we open the Bivariate Correlations dialog Analyze/Correlate/Bivariate…
In the dialog box we add both variables to the list of variables to analyze and select Pearson Correlation Coefficient and two-tailed Test of Significance.
The Point-Biserial Correlation Coefficient is a correlation measure of the strength of association between a continuous-level variable (ratio or interval data) and a binary variable. Binary variables are variables of nominal scale with only two values. Researchers also call them dichotomous or dummy variables in regression analysis. Binary variables are commonly used to express the existence of a certain characteristic (e.g., reacted or did not react in a chemistry sample) or the membership in a group of observed specimen (e.g., male or female).
If needed for the analysis, binary variables can also be created artificially by grouping cases or recoding variables. However it is not advised to artificially create a binary variable from ordinal or continuous-level (ratio or scale) data because ordinal and continuous-level data contain more variance information than nominal data and thus make any correlation analysis more reliable. For ordinal data use the Spearman Correlation Coefficient rho, for continuous-level (ratio or scale) data use Pearson’s Bivariate Correlation Coefficient r. Binary variables are also called dummy. The Point-Biserial Correlation Coefficient is typically denoted as rpb .
Like all Correlation Coefficients (e.g. Pearson’s r, Spearman’s rho), the Point-Biserial Correlation Coefficient measures the strength of association of two variables in a single measure ranging from -1 to +1, where -1 indicates a perfect negative association, +1 indicates a perfect positive association and 0 indicates no association at all. All correlation coefficients are interdependency measures that do not express a causal relationship.
Mathematically, the Point-Biserial Correlation Coefficient is calculated just as the Pearson’s Bivariate Correlation Coefficient would be calculated, wherein the dichotomous variable of the two variables is either 0 or 1—which is why it is also called the binary variable. Since we use the same mathematical concept, we do need to fulfill the same assumptions, which are normal distribution of the continuous variable and homoscedasticity.
Typical questions to be answered with a Point-Biserial Correlation Analysis are as follows:
Since all correlation analyses require the variables to be randomly independent, the Point-Biserial Correlation is not the best choice for analyzing data collected in experiments. For these cases a Linear Regression Analysis with Dummy Variables is the best choice. Also, many of the questions typically answered with a Point-Biserial Correlation Analysis can be answered with an independent sample t-Test or other dependency tests (e.g., Mann-Whitney-U, Kruskal-Wallis-H, and Chi-Square). Not only are some of these tests robust regarding the requirement of normally distributed variables, but also these tests analyze dependency or causal relationship between an independent variable and dependent variables in question.