8.1. Cluster Analysis
Cluster Analysis is used to determine the inherent or natural groupings in data, or provide a convenient summary of data into a given number of groups. There are two main categories of Cluster Analysis, hierarchical and nonhierarchical. The main difference between these two methods is that while the hierarchical method forms clusters sequentially, starting with the most similar pair and forming higher clusters step-by-step, the nonhierarchical methods evaluate the overall distribution of pairs and then classify them into a given number of groups.
Clusters are formed row-wise. If the data is not already in this form, you may use Data Processor’s Data → Transpose Matrix utility to obtain the correct format. There is no limitation on the number of cases to be clustered, except for the available memory and hard disk space. But beware: this is an n3 procedure and you may have to wait a bit (!) if you have thousands of cases. Also, it is not possible to draw character dendrograms for more than 800 cases.
This implementation of Cluster Analysis provides nine hierarchical (Average Between Groups, Average Within Groups, Single Linkage, Complete Linkage, Centroid, Median, Ward, McQuitty, Flexible), one modified hierarchical (K-th neighbour) and one nonhierarchical (K-means) method.