Unistat Statistics Software | General Linear Model

7.3.2. General Linear Model

UNISTAT’s GLM procedure can handle unbalanced designs where the number of observations in cells are not necessarily equal. Data may contain missing values and / or missing cells. Factor columns may be numeric, string, date or time variables and need not be sorted. It is possible to select an unlimited number of dependent variables, factors, repeated measures and covariates. When more than one dependent variable is selected, the analysis will be repeated for each dependent variable with the rest of the settings unchanged.

The GLM procedure assumes that each factor has a maximum of 1000 levels, though this number may be increased by the user, if necessary. It is possible to analyse simple factorial, repeated measures, nested and mixed designs. The output options include table of means, coefficients, fitted values and residuals and their plots. It is also possible to perform multiple comparison tests on the GLM model fitted.

This procedure allows more flexibility in defining the ANOVA model to be used. All the previous models can be built using the General Linear Model procedure. Also, this procedure can handle 4 way and above interactions / nested terms and allows the specification of the F-ratio terms.

The GLM procedure always adopts the Regression Approach to sum of squares. This means the factors can be selected in any order and the same sum of squares will be obtained.

7.3.2.1. GLM Variable Selection

General Linear Model

The variable selection for General Linear Model is slightly different from the ANOVA procedures. When a selection is made from the Variables Available list on the left, the variable remains there, allowing it to be selected again. The [Dependent] and [Covariate] buttons work as before (see 7.3.1.1. ANOVA Variable Selection). The [Factor], [Interaction], [Full] and [Nest] all add factors to the factor list box. They do this in the following way:

Factor: is used to select a single factor variable. If multiple variables are highlighted in the Variables Available list, then all these variables will be added to the factors list.

Interaction: is used to select an interaction term. This will only be active when multiple variables are selected in the Variables Available list. A single factor of the form FactA x FactB x … is added. You can always highlight multiple nonadjacent variables by pressing on the [Ctrl] key and then clicking on the desired variable.

A special cross sign is used between the variables in an interaction term. If there is a problem with this character on a non-English operating system, you can enter and edit the following line in Documents\Unistat10\Unistat10.ini file under [Options] to display any other character (say x):

InteractionCross=x

This character also appears in all ANOVA and GLM output with interaction terms.

Full: is used to add all the factors and interactions for a fully factorial model. This will only be active when multiple variables are selected in the Variables Available list. You can always highlight multiple nonadjacent variables by pressing on the [Ctrl] key and then clicking on the desired variable.

Nest: is used to create a nested term. This will only be active when a variable is selected in the Variables Available list and a factor is highlighted in the factors list on the right. When selected, this will create a nested factor of the form FactA(FactB). This can be repeated to create a multilevel nested factor e.g. FactA(FactB(FactC)). Nests on interactions, such as FactA(FactB x FactC), are not allowed.

PolyContr: This button is used to request orthogonal polynomial contrasts for factors or interactions of factors. When a factor or interaction term is selected on the Variables Selected list, the [PolyContr] button will be enabled. The selected factors should contain minimum two and maximum six levels. Otherwise the polynomial contrasts cannot be computed. Contrasts for interaction terms are obtained from the dummy variables created for the main factors.

Polynomial contrasts should be used when the factor levels are equally spaced, although UNISTAT will not check for this condition and compute contrasts anyway.

The following table shows the coefficients used in dummy variables for factors with two to six levels. Each coefficient is divided by the Divisor displayed on the right of the table.

Factor Levels	Polynomial Degree	D1	D2	D3	D4	D5	D6	Divisor
2	1	-1	1					2
3	1	-1	0	1				2
	2	1	-2	1				6
4	1	-3	-1	1	3			20
	2	1	-1	-1	1			4
	3	-1	3	-3	1			20
5	1	-2	-1	0	1	2		10
	2	2	-1	-2	-1	2		14
	3	-1	2	0	-2	1		10
	4	1	-4	6	-4	1		70
6	1	-5	-3	-1	1	3	5	70
	2	5	-1	-4	-4	-1	5	84
	3	-5	7	4	-4	-7	5	180
	4	1	-3	2	2	-3	1	28
	5	-1	5	-10	10	-5	1	252

Dependent: It is compulsory to select at least one column containing numeric data. When more than one data variable is selected, the analysis will be repeated as many times as the number of data variables, each time only changing the data variable and keeping the rest of selections unchanged.

Covariate: A covariate is a continuous (noncategorical) numeric variable that may be included in the analysis in order to remove its effect on the dependent variable. Any number of covariates can be selected from the Variables Available list by clicking on [Covariate]. When covariates are included in the analysis, an adjustment is made in the sum of squares for them before any other factors.

Repeated: This button is used to select a factor to define Within and Between Subjects terms. These can be used in the F-ratio selection to create a number of different repeated measure designs. If a variable is selected as [Repeated], then at least one factor must use the Error Within term in its F-ratio (see 7.3.2.2. GLM F-Ratio Selection).

7.3.2.2. GLM F-Ratio Selection

General Linear Model

This dialogue allows the denominator in the F-ratio of each factor to be specified independently. The possible selections include:

Residual: The denominator is given by the total error of the model. This is the default in GLM and it is also the denominator used in ANOVA.

None: The F-statistic and its tail probability are not calculated.

Error Term: The factor is not included in the model – also known as Fixed Factor. That is, the sum of squares calculated is not included in the explained sum of squares and hence the residual sum of squares is not adjusted for this factor. The sum of squares is reported separately from the factors included in the model. R-squared and adjusted R-squared are not reported for a model that includes error terms.

Any Factor’s Mean Square: The denominator is given by the mean square of the selected factor.

Error Between Subjects: This option will only be available when a column is selected as [Repeated] in the variable selection. The denominator is given by the Error Between Subjects mean square. This value is calculated by the sum of squares of the selected column minus the sum of square of all the factors with denominator selected as Error Between Subjects. So the mean square value depends on which factors are selected as Error Between Subjects.

Error Within Subjects: This option will only be available when a column is selected as [Repeated] in the variable selection. This term partitions the residual sum of squares with the Error Between Subjects. So the mean square value depends on which factors are selected as Error Between Subjects.

7.3.2.3. GLM Output Options

General Linear Model

The main output of the GLM procedure is the ANOVA table. However, GLM also reports a wide range of diagnostic statistics, post hoc tests and plots.

As of this version of UNISTAT, three more options have been added to the Output Options Dialogue. It is now possible to include any or all three output tables for each test.

Use Least Squares Means: The following three output options are preceded by an asterisk in the Output Options Dialogue;

· Table of Means,

· Multiple Comparisons and

· Profile Plot.

The common point for these options is that they display information based on cell and / or marginal means (where a cell is defined as a unique combination of factor levels used in the model and a marginal mean is the mean of a group of cell means). It is possible to display these output options based on the least squares means (LSMeans or adjusted means) instead of arithmetic means. To do this check the Use Least Squares Means box at the top of the dialogue. Then these three output options will use the least squares means and this will be indicated on the output. Note that the least squares means output will not be available when a nested term exists in the model.

Normally, in balanced designs arithmetic and least squares means will not be different. However, in unbalanced designs with more than one effect, the arithmetic mean of a group may not represent an appropriate response for that group, since it does not take other effects in the model into account. Least squares means can also be described as within-group means adjusted for the other effects in the model. In other words, least squares means are predicted population margins which estimate the marginal means over a balanced population.

The least squares means are computed as:

where L is the hypothesis matrix and is the vector of estimated regression coefficients as displayed in the third output option:

The standard error of LSMeans is defined as:

where MSE is the mean square error for the model.

To save the hypothesis matrix L in a file enter the following line under the [Options] section of Documents\Unistat10\Unistat10.ini file:

GLMSaveLMatrix=1

The matrix will be saved in:

..\Documents\Unistat10\GLMLMatrix.txt

To save the matrix enter the following line under the [Options] section of Unistat10.ini:

GLMSaveInvXpXMatrix=1

The matrix will be saved in the file:

..\Documents\Unistat10\GLMInvXpXMatrix.txt

Some of the GLM Output Options (such as coefficients, fitted values, residuals and their plots) are based on the underlying regression model for the GLM model. These are normally identical to the results obtained from the Linear Regression procedure if the same model is constructed using dummy variables (see 7.3.2.4. GLM Example below).

ANOVA: The factors in the model appear at the top of the table. Any factors that have been calculated but are not in the model appear at the bottom of the table in a section separated with a dividing line. These values may be used as denominators in F-statistic calculations. If a repeated measure has been specified, then the factor will be split into Between Subjects and Within Subjects terms.

* Table of Means: Number of cases, mean, standard deviation, standard error and the lower and upper limits of the confidence interval are displayed for each cell and marginal mean of the model. Missing cells are omitted from the analysis at the outset. This output option is similar to the Table of Means procedure available under Tests for ANOVA, but has the advantage of matching exactly the model specified in the GLM procedure. When the Use Least Squares Means box is checked, LSMeans will be displayed instead of arithmetic means.

Coefficients: Estimated coefficients for the underlying regression model are displayed. The Row Labels of this table are identical to that of Table of Means output above. Variables causing multicollinearity will be displayed with a zero coefficient at the end of the coefficients table. If you do not wish to display these variables enter the following line in the [Options] section of Documents\Unistat10\Unistat10.ini file:

DispCollin=0

Case (Diagnostic) Statistics: The actual, fitted and residual Y values for the underlying regression model are displayed.

* Multiple Comparisons: This output option is similar to the Multiple Comparisons procedure available under Tests for ANOVA, but has a major advantage. While the latter is always based on a one-way ANOVA model, this option takes the Mean Square Error and its Degrees of Freedom from the estimated GLM model. Note that both procedures give you the opportunity to override the suggested Mean Square Error and its Degrees of Freedom. When the Use Least Squares Means box is checked, comparisons will be made between LSMeans instead of arithmetic means.

General Linear Model

When the data used in multiple comparisons is already transformed with a function like e or 10 based logarithm, the results need to be transformed back to the original scale and the user is faced with the task of applying back transformations manually. The Antilog box allows the user to specify which back-transformation is to be applied in the output. The output values that are affected by this control are:

· means,

· difference between means, and

· lower and upper confidence limits for difference between means.

Let X be the value entered into the Antilog box.

· If X = 0 then no back-transformation is performed.

· If X = 1 then the natural antilog of the output value Y, Exp(Y) is displayed.

· If 1 < X ≤ 16, then X-based power of Y, X^Y is displayed.

When a back-transformation base value is specified, the columns affected will be marked by an asterisk in the output.

Plot of Actual and Fitted Values: Select this option to plot actual and fitted Y values against row numbers (index).

Plot of Residuals: Residuals are plotted against row numbers (index).

Normal Plot of Residuals: Residuals are plotted against the normal probability (probit) axis.

* Profile Plot: The cell means for an unlimited number of factors can be plotted. If only one factor is selected, the plot is similar to the Means Plot with Error Bars option available in X-Y Plots (see 4.1.1.3. Means Plot). When the Use Least Squares Means box is checked, LSMeans will be plotted instead of arithmetic means.

If two or more factors are selected, the first factor’s levels are represented on the X-Axis and the second factor’s levels as separate lines. This plot is known as Interaction Plot and it is different from a Means Plot with two factors (where interaction terms are plotted either on the X-Axis or as separate lines). The Interaction Plot is useful for comparing means of interaction terms. If the lines are parallel, then it can be concluded that there is no interaction between the two factors.

General Linear Model

To change the default factor selections, you can click on the [Opt] button situated to the left of this output option. A further Variable Selection Dialogue is displayed. It is compulsory to select a [Row Factor] variable, in which case a Means Plot is displayed for the selected factor. If an optional [Column Factor] is also selected, then an Interaction Plot is displayed.

General Linear Model

A third [Factor] button allows selection of an unlimited number of further factor columns. If at least one such factor is selected, then the program displays a further dialogue showing the levels of this factor. If two or more factors are selected then the dialogue shows the combinations of all selected factors. Only one level (or level combination) needs to be selected and the final plot is drawn for this selected subsample. When [Factor] columns are selected, selection of a [Column Factor] still remains optional.

General Linear Model

In earlier versions of UNISTAT, error bars for means plot represented only the standard error of mean. Now, (as in Means Plot), the Error Bars control on the Edit → Data Series dialogue allows selecting one of the following dispersion measures: None, t-interval, Z-interval, Standard Error, Standard Deviation, Variance

7.3.2.4. GLM Example

Published examples using GLM have already been introduced in section 7.3.0.2. ANOVA Designs. Here we shall demonstrate the full GLM output with a simple example. Next, we shall run the same model using the Linear Regression procedure and emphasise the similarities and differences.

Open DEMODATA and select Statistics 1 → ANOVA and GLM → General Linear Model. Highlight Region (C10) and Type (C11) on the Variables Available list and then click on the button [Full]. This will add two main factors and an interaction term to the Variables Selected list. Also select Output2 (C9) as [Dependent]. Accept default values suggested in the next two dialogues. Some of the following results have been shortened due to space considerations.

General Linear Model

ANOVA

Dependent Variable: Output2

Due To	Sum of Squares	DoF	Mean Square	F-Stat	Prob
Constant	669862.660	1	669862.660	281.165	0.0000
Region	70.934	2	35.467	0.774	0.4664
Type	1.122	1	1.122	0.024	0.8762
Region x Type	174.832	2	87.416	1.908	0.1586
Explained	206.738	5	41.348	0.902	0.4866
Error	2382.454	52	45.816
Total	2589.193	57	45.424

R-squared =	0.0798
Adjusted R-squared =	-0.0086

Table of Means

	Cases	Mean	Standard Deviation	Standard Error	Lower 95%	Upper 95%
whole sample	58	107.4679	6.7398	0.8850	105.6958	109.2401
Region = 1	14	106.1586	6.9131	1.8476	102.1671	110.1501
2	26	107.8269	6.7597	1.3257	105.0966	110.5572
3	18	107.9678	6.8330	1.6106	104.5698	111.3658
Type = 1	16	107.0837	6.9366	1.7342	103.3875	110.7800
2	42	107.6143	6.7430	1.0405	105.5130	109.7156
Region x Type = 1 x 1	7	103.6271	7.7791	2.9402	96.4327	110.8216
1 x 2	7	108.6900	5.2991	2.0029	103.7892	113.5908
2 x 1	5	111.5200	1.7219	0.7701	109.3820	113.6580
2 x 2	21	106.9476	7.2320	1.5782	103.6557	110.2396
3 x 1	4	107.5875	7.3881	3.6940	95.8315	119.3435
3 x 2	14	108.0764	6.9572	1.8594	104.0594	112.0934

Coefficients

	Coefficient	Standard Error	t-Statistic	Probability	Lower 95%	Upper 95%
Constant	108.0764	1.8090	59.7426	0.0000	104.4463	111.7065
Region = 1	0.6136	3.1333	0.1958	0.8455	-5.6740	6.9011
2	-1.1288	2.3355	-0.4833	0.6309	-5.8153	3.5576
3	*
Type = 1	-0.4889	3.8375	-0.1274	0.8991	-8.1895	7.2117
2	*
Region x Type = 1 x 1	-4.5739	5.2742	-0.8672	0.3898	-15.1574	6.0096
1 x 2	*
2 x 1	5.0613	5.1060	0.9912	0.3262	-5.1848	15.3074
2 x 2	*
3 x 1	*
3 x 2	*

* omitted due to multicollinearity

Case (Diagnostic) Statistics

	Actual Y	Fitted Y	Residuals
1	103.3600	108.0764	-4.7164
2	105.1300	108.6900	-3.5600
3	105.9700	106.9476	-0.9776
…	…	…	…
56	93.1100	103.6271	-10.5171
57	93.2000	106.9476	-13.7476
58	93.4700	108.0764	-14.6064

Dunnett

For Output2, classified by Region

Control Group: 1, Two-Tailed Test

Mean Square Error: 45.8164292353523, Degrees of Freedom: 52

** denotes significantly different pairs. Vertical bars show homogeneous subsets.

A pairwise test result is significant if its q stat value is greater than the table q.

Group	Cases	Mean	1
1	14	106.1586		\|
2	26	107.8269		\|
3	18	107.9678		\|

Comparison	Difference	Standard Error	q Stat	Table q	Probability
3 – 1	1.8092	2.4120	0.7501	2.2581	0.6578
2 – 1	1.6684	2.2438	0.7435	2.2581	0.6623

Comparison	Lower 95%	Upper 95%	Result
3 – 1	-3.6375	7.2559
2 – 1	-3.3985	6.7352

Homogeneous Subsets:
Group 1:	1 2 3
Pooled mean =	107.4679
95% Confidence Interval =	105.6845 <> 109.2514

General Linear Model

Now we run the same model using Linear Regression.

Open DEMODATA and select Statistics 1 → Regression Analysis → Linear Regression. Highlight Region (C10) and Type (C11) on the Variables Available list and then click on the button [Full]. This will add two main factors and an interaction term to the Variables Selected list. Also select Output2 (C9) as [Dependent]. From the Output Options Dialogue select only the Regression Results and ANOVA of Regression. The following results are obtained.

The Regression Results option produces exactly the same results as the GLM model. In the ANOVA of Regression table, Regression, Error and Total terms are also identical to Explained, Error and Total of GLM’s ANOVA table. Moreover, if we add the two interaction Ssq’s (1×1 and 2×1) we obtain the Region x Type Ssq of GLM. Main interactions are, however, different. This is because the GLM Procedure adopts the Regression Approach when it computes the sum of squares, whereas Linear Regression’s sum of squares decomposition is sequential. See 7.3.1.3. ANOVA Approaches.

Linear Regression

Dependent Variable: Output2

Valid Number of Cases: 58, 0 Omitted

Regression Results

* omitted due to multicollinearity

	Coefficient	Standard Error	t-Statistic	Probability	Lower 95%	Upper 95%
Constant	108.0764	1.8090	59.7426	0.0000	104.4463	111.7065
Region = 1	0.6136	3.1333	0.1958	0.8455	-5.6740	6.9011
2	-1.1288	2.3355	-0.4833	0.6309	-5.8153	3.5576
Type = 1	-0.4889	3.8375	-0.1274	0.8991	-8.1895	7.2117
Region x Type = 1 x 1	-4.5739	5.2742	-0.8672	0.3898	-15.1574	6.0096
2 x 1	5.0613	5.1060	0.9912	0.3262	-5.1848	15.3074
Region = 3	*
Type = 2	*
Region x Type = 1 x 2	*
2 x 2	*
3 x 1	*
3 x 2	*

Residual Sum of Squares =	2382.4543
Standard Error =	6.7688
Mean of Y =	107.4679
Stand Dev of y =	6.7398
Correlation Coefficient =	0.2826
R-squared =	0.0798
Adjusted R-squared =	-0.0086
F(5,52) =	0.9025
Probability of F =	0.4866
Durbin-Watson Statistic =	0.1641
log of likelihood =	-190.2131
Press Statistic =	2916.1898

ANOVA of Regression

Due To	Sum of Squares	DoF	Mean Square	F-Stat	Prob
Region = 1	31.639	1	31.639	0.691	0.4098
2	0.211	1	0.211	0.005	0.9462
Type = 1	0.057	1	0.057	0.001	0.9721
Region x Type = 1 x 1	129.815	1	129.815	2.833	0.0983
2 x 1	45.017	1	45.017	0.983	0.3262
Regression	206.738	5	41.348	0.902	0.4866
Error	2382.454	52	45.816
Total	2589.193	57	45.424	0.991	0.5142

Previous topic | Next topic