Discriminant Analysis
Discriminant analysis is a statistical technique that is used to classify the dependent variable between two or more categories. Discriminant analysis also has a regression technique, which is used for predicting the value of the dependent categorical variable. In discriminant analysis, we predict the value of two categories. When the category of a dependent variable is more than two, it will simply be an extension of the simple discriminant analysis called the multiple discriminant analysis. Multiple discriminant analysis and MANOVA are considered to be similar because both tests share many similar assumptions and tests. The F test (Wilks' lambda) is used to test whether or not the discriminant model is significant as a whole. If the F test shows the overall significance of the model, then the individual variables are accessed to see which variable will move the significance from the group mean. Discriminant analysis also assumes several assumptions, such as multiple linear regressions, linear relationships, homoscedastic relationships, untruncated interval data, etc. Logistic regression is the alternative technique and it is frequently used in place of discriminant analysis when data does not meet the assumptions.
Key Terms and Concepts:
Discriminating variables: Discriminating variables are independent variables that are used to predict the dependent variable. These variables are also called the predictors.
The criterion variable: Dependent variables are also called the criterion variables.
Discriminant function: The Linear combination of the discriminating (independent) variable is called the discriminant function. For example,
L = b1x1 + b2x2 + ... + bnxn + c
L= discriminant function
b1= discriminant coefficients
X= independents variables
C = constants
Number of discriminant functions: For the two groups, there is one discriminant analysis function. For multivariate discriminant analysis there will be g-1 discriminant function.
The Eigenvalues: This is also called characteristic root, which tells us the variance explained by each discriminant function.
The discriminant score: By applying discriminant formulas, the value that comes is called the discriminant score. This discriminant score helps us to classify the group category.
Cutoff: This is the value which divides the group value into two parts. When the value of the discriminant score is at the negative side of the cutoff point, then the group will fall into a lower category, and when it is at the positive side, the group will be at a higher category.
Tests of significance:
Wilks' lambda: The overall model significance of the discriminant function is tested by the walks’ lambda test. If the overall model is significant, than the F test is used to test whether or not the individual variable means differ from the group mean function.
Assumptions in Discriminant analysis:
1. Independence: Each case should be independent of each other. Correlated data cannot be used in discriminant analysis.
2. Adequate sample size: There must be at least two cases for each category of the dependent variable. However, it is recommended that there should be at least four or five times as many cases as independent variables.
3. Interval data: In discriminant analysis, there should be an interval data for independent variable.
4. Variance: No independents have a zero standard deviation in one or more of the groups formed by the dependent.
5. Random error: Error terms are assumed to be randomly distributed.
6. Homogeneity of variances: Variance with each group of independent variables should be equal.
7. Absence of perfect multicollinearity: There should be no perfect multicollinearity between the independent variables.
8. Assumes linearity: The discriminant functions should be linear and related to each other.
9. Normally distributed: The predictor variable should be normally distributed.
![]()




