Establishing Cause and Effect

A central goal of most research is the identification of causal relationships, or demonstrating that a particular independent variable (the cause) has an effect on the dependent variable of interest (the effect).  The three criteria for establishing cause and effect – association, time ordering (or temporal precedence), and non-spuriousness – are familiar to most researchers from courses in research methods or statistics.  While the classic examples used to illustrate these criteria may imply that establishing cause and effect is straightforward, it is often one of the most challenging aspects of designing research studies for implementation in real world conditions.

The first step in establishing causality is demonstrating association; simply put, is there a relationship between the independent variable and the dependent variable?  If both variables are numeric, this can be established by looking at the correlation between the two to determine if they appear to convey.  A common example is the relationship between education and income: in general, individuals with more years of education are also likely to earn higher incomes.  Cross tabulation, which cross-classifies the distributions of two categorical variables, can also be used to examination association.  For example, we may observe that 60% of Protestants support the death penalty while only 35% of Catholics do so, establishing an association between denomination and attitudes toward capital punishment.  There is ongoing debate regarding just how closely associated variables must be to make a causal claim, but in general researchers are more concerned with the statistical significance of an association (whether it is likely to exist in the population) than with the actual strength of the association.

Once an association has been established, our attention turns to determining the time order of the variables of interest.  In order for the independent variable to cause the dependent variable, logic dictates that the independent variable must occur first in time; in short, the cause must come before the effect.  This time ordering is easy to ensure in an experimental design where the researcher carefully controls exposure to the treatment (which would be the independent variable) and then measures the outcome of interest (the dependent variable).  In cross-sectional designs the time ordering can be much more difficult to determine, especially when the relationship between variables could reasonably go in the opposite direction.  For example, although education usually precedes income, it is possible that individuals who are making a good living may finally have the money necessary to return to school.  Determining time ordering thus may involve using logic, existing research, and common sense when a controlled experimental design is not possible.  In any case, researchers must be very careful about specifying the hypothesized direction of the relationship between the variables and provide evidence (either theoretical or empirical) to support their claim.

The third criterion for causality is also the most troublesome, as it requires that alternative explanations for the observed relationship between two variables be ruled out.  This is termed non-spuriousness, which simply means “not false.”  A spurious or false relationship exists when what appears to be an association between the two variables is actually caused by a third extraneous variable.  Classic examples of spuriousness include the relationship between children’s shoe sizes and their academic knowledge: as shoe size increases so does knowledge, but of course both are also strongly related to age.  Another well-known example is the relationship between the number of fire fighters that respond to a fire and the amount of damage that results – clearly, the size of the fire determines both, so it is inaccurate to say that more fire fighters cause greater damage.  Though these examples seem straightforward, researchers in the fields of psychology, education, and the social sciences often face much greater challenges in ruling out spurious relationships simply because there are so many other factors that might influence the relationship between two variables.  Appropriate study design (using experimental procedures whenever possible), careful data collection and use of statistical controls, and triangulation of many data sources are all essential when seeking to establish non-spurious relationships between variables.