May 22, 2012

Nominal Variable Association

Many variables are measured at the nominal level, meaning that the categories of the variables have no inherent ranking.  Examples of nominal variables commonly of interest include race, religious affiliation, or college major.  The examination of bivariate relationships between nominal variables most commonly uses crosstabulation (also known as contingency or bivariate tables).

Crosstabulation allows researchers to examine the association between two variables; specifically, it shows whether being in one category of the independent variable makes a case more likely to be in a particular category of the dependent variable.  Using the variables above as an example, crosstabulation can address the question of whether African American students are more likely to major in business or whether Hispanic students are more likely to major in the natural sciences.  Patterns of association can be examined simply by comparing percentages across rows of the table (assuming that the convention of putting the independent variable in the columns and the dependent variable in the rows has been followed).

While an examination of the percentages is useful for establishing whether a pattern exists in the table and calculating percent differences can speak generally to the strength of the relationship, crosstabulation of nominal variables can provide additional information as well.  Using a Chi-Square Test of Independence allows researchers to assess whether the relationship observed between the nominal variables in a particular sample is also likely to be found in the population.  Several measures also exist which allow researchers to evaluate the strength of the association between two nominal variables.  Such measures are similar to Pearson’s correlation in that that they have specific bounds within which they fall and therefore provide a standard way of speaking about the strength of the association between two nominal variables.  A general rule of thumb for interpreting the strength of associations is:

< .10 = weak

.11 – .30 = moderate

> .31 = strong

Contingency Coefficient

Each of the following measures uses the Chi-Square value calculated for the crosstabulation table of interest.  The contingency coefficient is calculated as follows:

This measure ranges between 0 and 1, with values closer to 1 indicating a stronger association between the variables.  The CC is highly sensitive to the size of the table and should therefore be interpreted with caution.

Phi Coefficient

A measure of association used for 2 x 2 tables is the Phi coefficient:

Again, the measure ranges between 0 and 1 with higher values meaning a stronger association.

Cramer’s V

When the crosstabulation table is larger than 2 x 2, Cramer’s V is the best choice:

Here, N is the sample size and k is the smaller of the number of rows or columns (so it would be 3 for a 3 x 4 table).

Lambda

Unlike the above Chi-Square based measures, Lambda is a Proportional Reduction in Error (PRE) measure which is interpreted as the amount of improvement in predicting the dependent variable that can be attributed to the independent variable.  Another way to say this is: how much better is our guess about which category of the dependent variable each case will fall into if we know the case’s value on the independent variable?  Much like R-squared in regression (which is also a PRE measure), lambda is often multiplied by 100 to represent a percentage of explained variance.  Lambda is a directional measure in that the calculation differs based on which variable is treated as the independent variable.

All of the above measures of association are available by clicking on the Statistics button when requesting crosstabulations in SPSS.  Lambda is calculated in both directions, treating each variable as independent.  The other three measures are symmetric, meaning that it does not matter which variable is treated as independent.