CHAID

Chi-square Automatic Interaction Detector (CHAID) was a technique created by Gordon V. Kass in 1980.  CHAID helps identify relationships between variables. CHAID analysis builds a predictive medel, or tree, to help determine how variables best merge to explain the outcome in the given dependent variable. In  CHAID analysis uses nominal, ordinal, and continuous data, splitting continuous predictors into equal-sized categories. It cross-tabulates categorical predictors until achieving the best outcome with no further splits.

In the CHAID technique, we can visually see the relationships between the split variables and the associated related factor within the tree.  The decision tree begins by identifying the target or dependent variable, which serves as the root. CHAID analysis splits the target into parent nodes, then uses statistical algorithms to create child nodes. Unlike regression, CHAID does not require normally distributed data.

Merging: In CHAID analysis, the F test applies to continuous dependent variables, while the chi-square test applies to categorical ones. CHAID assesses predictor categories for the least significant difference and calculates a Bonferroni-adjusted p-value after merging.

Decision tree components in CHAID analysis:

In this analysis, the following are the components of the decision tree:

  1. Root node: The root node holds the target variable. For example, a bank predicting credit card risk uses factors like age, income, and card count as predictors.
  2. Parent’s node: The root node holds the target variable. For example, a bank predicting credit card risk uses factors like age, income, and card count as predictors.
  3. Child node: Here, child nodes are independent variable categories below the parent nodes.
  4. Terminal node: The last categories in the this tree are terminal nodes. The most influential category appears first, while the least important comes last.  Thus, it is called the terminal node.

Related Pages: