Unistat Statistics Software | Regression and ANOVA-Categorical Data

7.0.2. Categorical Data

All Analysis of Variance procedures require use of categorical variables to determine the group membership of a particular observation in a continuous data variable (see 7.3.0.1. ANOVA and GLM Data Format).

Missing data handling: Any rows containing one or more missing values are omitted. In case of Analysis of Variance procedures, this includes the factor columns as well as the continuous data variable.

As in matrix format data explained above, selection of more than one dependent variable also needs special consideration here. In such cases, missing values will be omitted for each run (with a different dependent variable) separately.

Factors: A column intended for classifying observations of another column is called a factor. A factor may be a numeric or string variable with a limited number of distinct values (levels). An unlimited number of data columns may be selected from the Variables Available list as factors by clicking on [Factor]. The order of selection is significant. All Analysis of Variance procedures also require the choice of one or more continuous data (dependent) variables which are marked by clicking on [Dependent]. When more than one [Dependent] variable is selected, the same model will be run with each data variable separately.

Levels: A numeric or string variable selected as a factor must have a limited number of distinct values, which are called levels. Levels can be any integer or floating point numbers, or short or long strings. The program will first scan a factor column and register the distinct values. If the number of such values is too large, then the program may run out of memory. In this case a message will be displayed and the procedure will be aborted.

Previous topic | Next topic