# Cluster Analysis

Cluster analysis is a class of techniques that are used to classify objects or cases into relative groups called clusters.  Cluster analysis is also called classification analysis or numerical taxonomy.  In cluster analysis, there is no prior information about the group or cluster membership for any of the objects.

Cluster Analysis has been used in marketing for various purposes.  Segmentation of consumers in cluster analysis is used on the basis of benefits sought from the purchase of the product.  It can be used to identify homogeneous groups of buyers.

Cluster analysis involves formulating a problem, selecting a distance measure, selecting a clustering procedure, deciding the number of clusters, interpreting the profile clusters and finally, assessing the validity of clustering.

The variables on which the cluster analysis is to be done should be selected by keeping past research in mind.  It should also be selected by theory, the hypotheses being tested, and the judgment of the researcher.  An appropriate measure of distance or similarity should be selected; the most commonly used measure is the Euclidean distance or its square.

Clustering procedures in cluster analysis may be hierarchical, non-hierarchical, or a two-step procedure.  A hierarchical procedure in cluster analysis is characterized by the development of a tree like structure.  A hierarchical procedure can be agglomerative or divisive.  Agglomerative methods in cluster analysis consist of linkage methods, variance methods, and centroid methods.  Linkage methods in cluster analysis are comprised of single linkage, complete linkage, and average linkage.

The non-hierarchical methods in cluster analysis are frequently referred to as K means clustering.  The two-step procedure can automatically determine the optimal number of clusters by comparing the values of model choice criteria across different clustering solutions.  The choice of clustering procedure and the choice of distance measure are interrelated.  The relative sizes of clusters in cluster analysis should be meaningful.  The clusters should be interpreted in terms of cluster centroids.

There are certain concepts and statistics associated with cluster analysis:

• Agglomeration schedule in cluster analysis gives information on the objects or cases being combined at each stage of the hierarchical clustering process.
• Cluster Centroid is the mean value of a variable for all the cases or objects in a particular cluster.
• A dendrogram is a graphical device for displaying cluster results.
• Distances between cluster centers in cluster analysis indicate how separated the individual pairs of clusters are. The clusters that are widely separated are distinct and therefore desirable.
• Similarity/distance coefficient matrix in cluster analysis is a lower triangle matrix containing pairwise distances between objects or cases.

Cluster Analysis Resources:

Abonyi, J., & Feil, B. (2007). Cluster analysis for data mining and system identification. Boston, MA: Birkhäuser Basel.

Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Newbury Park, CA: Sage Publications.

Anderberg, M. R. (1973). Cluster analysis for applications. New York: Academic Press.

Arabie, P., Carroll, J. D., & DeSarbo, W. S. (1987). Three-way scaling and clustering. Newbury Park, CA: Sage Publications.

Everitt, B. S. (1980). Cluster analysis. Quality and Quantity, 14(1), 75-100.

Everitt, B. S., Landau, S., & Leese, M. (2001). Cluster analysis (4th ed.). London: Arnold.

Everitt, B. S., & Rabe-Hesketh, S. (1997). The analysis of proximity data. London: Arnold.

Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, NJ: Prentice Hall.

Jajuga, K., Sokolowski, A., & Bock, H. -H. (2002). Classification, clustering and data analysis. New York: Springer.

Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.

Meli, M., & Heckerman, D. (1998). An experimental comparison of several clustering and initialization methods. Redmond, WA: Microsoft.

Rapkin, B. D., & Luke, D. A. (1993). Cluster analysis in community research: Epistemology and practice. American Journal of Community Psychology, 21(2), 247-277.

Romesburg, H. C. (2004). Cluster analysis for researchers. North Carolina: Lulu.

Sireci, S. G., & Geisinger, K. F. (1992). Analyzing test content using cluster analysis and multidimensional scaling. Applied Psychological Measurement, 16(1), 17-31.

SPSS, Inc. (2001). The SPSS twostep cluster component, a scalable component enabling more efficient customer segmentation. Chicago, IL: SPSS.

Related Pages: