8.0. Overview
Multivariate analysis is useful when the data consists of various measurements (variables) on the same set of cases. You can determine which cases can be grouped together (Cluster Analysis) or belong to a predetermined group (Discriminant Analysis) or reduce the dimensionality of the data by forming linear combinations of the existing variables (Principal Components Analysis, Factor Analysis and Canonical Correlations). The derived configurations will represent most of the variation in the original data with a smaller number of variables, thus enabling the user to describe the data in a more straightforward manner by graphical or other statistical methods. Therefore, it is important that the user should have immediate access to 2D and 3D graphical representation of results, as well as the ability to save results as data for further analysis.
Central to most multivariate methods is the concept of proximity, which can be defined as the relationship between two points in multidimensional space. A proximity measure may reflect similarity or dissimilarity of the two points. For instance, when we describe the relative positions of two points in terms of the distance between them, we are using a dissimilarity measure. The further apart the two points, the greater their dissimilarity, and when they are identical, the dissimilarity becomes zero. Euclidian distance is just one dissimilarity measure among many others. Using the Multivariate Analysis module you can compute eight proximity measures from the raw data, or enter any square and symmetric matrix for analysis as proximities.
In this chapter an effort will be made to introduce the user to the basic concepts of multivariate analysis. However as it is impossible to teach an immense topic such as this in a User’s Guide, we recommend that the novice user should consult an introductory text book on the subject. See, for instance, Stevens, J. (1986) or Morrison, D. F. (1990).
Data types: Where relevant, the program will allow you to input raw data or an already formed proximity matrix for a multivariate procedure.
The following table shows which procedures allow what sort of data.
Procedure |
Raw Data |
Proximities |
Cluster Analysis |
|
|
Hierarchical |
Yes |
Yes |
K-th Neighbour |
Yes |
No |
K-Means |
Yes |
No |
Discriminant Analysis |
|
|
Multiple |
Yes |
No |
K-NN |
Yes |
No |
Multidimensional Scaling |
No |
Yes |
Principal Components |
Yes |
Yes |
Factor Analysis |
|
|
Principal Components |
Yes |
Yes |
Principal Axis |
Yes |
Yes |
Canonical Correlations |
Yes |
No |
Reliability Analysis |
Yes |
No |
Table 8.1. Data types accepted by multivariate procedures.
You do not need to take any action to tell the program whether the data to be analysed is a proximity matrix or whether it is raw data. The program will conclude that the data is a proximity matrix if the selected columns form a square and symmetric matrix.
The following proximity matrices can be formed and input for multivariate analysis:
1) Covariance matrix
2) Correlation matrix
3) Euclid
4) Squared Euclid
5) Cosine
6) Chebychev
7) Block
8) Power
The first two matrices can be generated and saved to the Data Processor using the Statistics 1 → Matrix Statistics option. The latter six can be generated using the Cluster Analysis procedure, saved to the Data Processor and then analysed selecting the desired multivariate procedure. It is not necessary to perform a Cluster Analysis to generate proximity matrices.
If the data selected for a multivariate analysis is not a proximity matrix, then the program will permit the formation of a standardised or non standardised proximity matrix from raw data. Standard and non standard proximity matrices are directly proportional to simple (Pearson) correlations matrix and covariance matrix for the same data respectively.
Missing data: If raw data is selected for analysis then any rows containing at least one missing value will be omitted (listwise deletion). Missing data are not allowed in proximity matrices.
Convergence criteria: The first step for almost all multivariate procedures (with the exception of Cluster Analysis) is to compute eigenvalues and eigenvectors of the proximity matrix. The proximity matrix itself is either formed by the procedure from raw data or it is supplied by the user. The core algorithms adopted here are TRED2 which performs a Housholder reduction of the proximity matrix and TQL2 which applies an iterational algorithm (QL) to determine the eigenvalues and eigenvectors (See Smith, B. T. et al, 1976).
Iterations continue until either the reduction in the objective function is less than a given tolerance level, or the maximum number of iterations is reached. A dialogue will allow you to edit these two parameters.
Graphics: All multivariate procedures will offer results in the form of graphical, as well as tabulated numeric output. Each graphics option has its own controls allowing you to annotate and edit the appearance of graphs. The program will display a 2D graph if you select two variables to plot, and a 3D graph if you select three variables. For common graphics controls see 2.3. Graphics Editor.
Saving results for further analysis: In Stand-Alone Mode, it is possible to save all tabulated results to the spreadsheet for further analysis by clicking on the UNISTAT icon situated on the Output Medium Toolbar.
Row labels: In tables and graphs where rows of the data matrix are displayed (e.g. in Cluster Analysis and Discriminant Analysis), rows are referred to by their labels rather than their numbers. Therefore, if the data has no Row Labels, the parts of the table or the graph referring to rows will be left blank. To display row numbers as labels return to the Data Processor, select Edit → Row Labels and then select the Years option. If an initial value of 1 is supplied, then Row Labels will be set to row numbers. Although this may seem extra work on the users part, it makes annotation of output much more flexible.