7.2. Regression Analysis
- Linear Regression
- Polynomial Regression
- Stepwise Regression
- Nonlinear Regression
- Logit / Probit / Gompit
- Logistic Regression
- Multinomial Regression
- Poisson Regression
- Box-Cox Regression
The Regression Analysis is used to estimate the coefficients B0, …, Bm of the equation:
Y = B0 + B1X1 +…+ BmXm
given n observations on m independent variables X1, …, Xm and a dependent variable Y. The Stepwise Regression procedure also determines a subset of the selected variables which contribute significantly to the explanation of variation in the dependent variable.
It is possible to select any numeric column of data as the dependent variable and to select the columns to be included in the analysis as independent variables. A Regression Analysis can be performed by selecting one column as the dependent variable and at least one column as an independent variable. The program will not proceed unless this requirement is met. Regressions can also be run on a sub-set of cases as determined by a combination of factor columns. The Polynomial Regression procedure allows the choice of one independent variable, but will also require the degree of the polynomial to be entered.
The Variable Selection Dialogue contains a check box to include the constant term (or the intercept) in the analysis. The default is regression with constant as in the above equation. If this box is unchecked then the following equation without a constant term will be estimated:
Y = B1X1 + … + BmXm.
An important feature of regression models without a constant term is that the method they employ for calculation of R-squared and adjusted R-squared values is fundamentally different from that of regression with a constant term. Therefore, R-squared values calculated for regressions with and without a constant term are not comparable.
The standard method of calculating the R-squared value for regressions including a constant term can be expressed as:
R-squared = 1 – Var(Residuals) / Var(Dependent)
where Var() stands for variance. However, this definition fails completely when the constant term is omitted from the model. A better definition, which applies to both types of regression, can be made by reference to the ANOVA of Regression table, where Ssq() stands for sum of squares:
R-squared = Ssq(Regression) / Ssq(Total)
There is also a slight difference between Linear Regression and Polynomial Regression on one hand and Stepwise Regression, Analysis of Variance and General Linear Model procedures on the other, in the way they handle the degrees of freedom in regressions without a constant. In line with the most common approach in the literature, we here also calculate the degrees of freedom as (n – m, m) in Stepwise Regression, Analysis of Variance and General Linear Model procedures and (n – m, m – 1) in the Linear Regression and Polynomial Regression procedures.
Also, although both groups of procedures operate in double precision, there may be a slight difference between their estimates on the same set of data. The reason for this is that two completely different algorithms are used in each case: the Linear Regression and Polynomial Regression procedures are based on the square root free version of the Cholesky decomposition originally suggested by Gentleman (1974, Applied Statistics, 23, pp. 448-454), whereas the Stepwise Regression, Analysis of Variance and General Linear Model procedures are based on the SWEEP algorithm by Jennrich (in Statistical Methods for Digital Computers, ed. Enslein, Ralston, Wilf, 1977, Wiley, pp. 58-75). The first algorithm is more accurate but the second is more suitable for Stepwise Regression and Analysis of Variance.