# Conduct and Interpret a Factor Analysis

**What is the Factor Analysis?**

The Factor Analysis is an explorative analysis. Much like the cluster analysis grouping similar cases, the factor analysis groups similar variables into dimensions. This process is also called identifying latent variables. Since factor analysis is an explorative analysis it does not distinguish between independent and dependent variables.

Factor Analysis reduces the information in a model by reducing the dimensions of the observations. This procedure has multiple purposes. It can be used to simplify the data, for example reducing the number of variables in predictive regression models. If factor analysis is used for these purposes, most often factors are rotated after extraction. Factor analysis has several different rotation methods—some of them ensure that the factors are orthogonal. Then the correlation coefficient between two factors is zero, which eliminates problems of multicollinearity in regression analysis.

Factor analysis is also used in theory testing to verify scale construction and operationalizations. In such a case, the scale is specified upfront and we know that a certain subset of the scale represents an independent dimension within this scale. This form of factor analysis is most often used in structural equation modeling and is referred to as Confirmatory Factor Analysis. For example, we know that the questions pertaining to the big five personality traits cover all five dimensions N, A, O, and I. If we want to build a regression model that predicts the influence of the personality dimensions on an outcome variable, for example anxiety in public places, we would start to model a confirmatory factor analysis of the twenty questionnaire items that load onto five factors and then regress onto an outcome variable.

Factor analysis can also be used to construct indices. The most common way to construct an index is to simply sum up the items in an index. In some contexts, however, some variables might have a greater explanatory power than others. Also sometimes similar questions correlate so much that we can justify dropping one of the questions completely to shorten questionnaires. In such a case, we can use factor analysis to identify the weight each variable should have in the index.

*The Factor Analysis in SPSS*

The research question we want to answer with our explorative factor analysis is as follows:

What are the underlying dimensions of our standardized and aptitude test scores? That is, how do aptitude and standardized tests form performance dimensions?

The factor analysis can be found in *Analyze/Dimension Reduction/Factor… *

In the dialog box of the *factor analysis* we start by adding our variables (the standardized tests math, reading, and writing, as well as the aptitude tests 1-5) to the list of variables.

In the dialog *Descriptives…* we need to add a few statistics for which we must verify the assumptions made by the factor analysis. If you want the Univariate Descriptives that is your choice, but to verify the assumptions we need the KMO test of sphericity and the Anti-Image Correlation matrix.

The dialog box *Extraction…* allows us to specify the extraction method and the cut-off value for the extraction. Let's start with the easy one – the cut-off value. Generally, SPSS can extract as many factors as we have variables. The *eigenvalue* is calculated for each factor extracted. If the *eigenvalue* drops below 1 it means that the factor explains less variance than adding a variable would do (all variables are standardized to have mean = 0 and variance = 1). Thus we want all factors that better explain the model than would adding a single variable.

The more complex bit is the appropriate extraction method. *Principal Components* (PCA) is the standard extraction method. It does extract uncorrelated linear combinations of the variables. The first factor has maximum variance. The second and all following factors explain smaller and smaller portions of the variance and are all uncorrelated with each other. It is very similar to *Canonical Correlation* Analysis. Another advantage is that PCA can be used when a correlation matrix is singular.

The second most common analysis is *principal axis *factoring, also called *common factor analysis*, or *principal factor analysis*. Although mathematically very similar to principal components it is interpreted as that principal axis that identifies the latent constructs behind the observations, whereas principal component identifies similar groups of variables.

Generally speaking, principal component is preferred when using factor analysis in causal modeling, and principal factor when using the factor analysis to reduce data. In our research question we are interested in the dimensions behind the variables, and therefore we are going to use Principal Axis Factoring.

The next step is to select a rotation method. After extracting the factors, SPSS can rotate the factors to better fit the data. The most commonly used method is *Varimax*. *Varimax* is an orthogonal rotation method (that produces independent factors = no multicollinearity) that minimizes the number of variables that have high loadings on each factor. This method simplifies the interpretation of the factors.

A second, frequently used method is *Quartimax*. *Quartimax* rotates the factors in order to minimize the number of factors needed to explain each variable. This method simplifies the interpretation of the observed variables.

Another method is *Equamax*. *Equamax* is a combination of the *Varimax* method, which simplifies the factors, and the *Quartimax* method, which simplifies the variables. The number of variables that load highly on a factor and the number of factors needed to explain a variable are minimized. We choose Varimax.

In the dialog box *Options* we can manage how missing values are treated – it might be appropriate to replace them with the mean, which does not change the correlation matrix but ensures that we don't over penalize missing values. Also, we can specify that in the output we don't want to include all factor loadings. The factor loading tables are much easier to interpret when we suppress small factor loadings. Default value is 0.1 in most fields. It is appropriate to increase this value to 0.4. The last step would be to save the results in the *Scores…* dialog. This calculates a value that every respondent would have scored had they answered the factors questions (whatever they might be) instead. Before we save these results to the data set, we should run the factor analysis first, check all assumptions, ensure that the results are meaningful and that they are what we are looking for and then we should re-run the analysis and save the factor scores.