Unistat Statistics Software | Poisson Regression

7.2.8. Poisson Regression

The Poisson Regression procedure is suitable for models where the dependent variable is a frequency (count) variable consisting of nonnegative integers. The exponential of estimated regression coefficients are called Incidence Rate Ratios, which give the estimated rate at which events occur. This rate can be multiplied by an Exposure variable to obtain the expected frequencies, which enters the model with a coefficient constrained as 1.

Predictions (interpolations) and multicollinearity are handled as in other regression options (see, for instance, Logistic Regression). Predicted cases are identified by an asterisk in Case (Diagnostic) Statistics output options (see 7.2.8.3. Poisson Regression Output Options). The spreadsheet Reg function (see 3.4.2.6.3. UNISTAT Functions) will give the natural log of predicted values, which should be exponentiated to obtain the expected frequencies.

7.2.8.1. Poisson Regression Model Description

Poisson Regression assumes that actual frequencies y_i are drawn from a Poisson distribution with parameters λ_i, i = 1, …, n. The associated probabilities are given as:

where:

is known as the loglinear model.

The logarithm of the likelihood function is given as:

and the first derivatives are:

A Newton-Raphson type maximum likelihood algorithm is employed to minimise the negative of the log likelihood function. The nature of this method implies that a solution (convergence) cannot always be achieved. In such cases, you are advised to edit the convergence parameters provided and try again.

7.2.8.2. Poisson Regression Variable Selection

Poisson Regression

As in other regression procedures, Poisson Regression can be used to estimate models with or without a constant term, with or without weights and regressions can be run on a subset of cases as determined by the levels of an unlimited number of factor columns. An unlimited number of dependent variables can be selected in order to run the same model on different dependent variables. It is also possible to include interaction terms, dummy and lag/lead variables in the model, without having to create them as spreadsheet columns first (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).

It is compulsory to select at least one numeric data column containing frequency counts as a dependent variable. When more than one dependent variable is selected, the analysis will be repeated as many times as the number of dependent variables, each time only changing the dependent variable and keeping the rest of selections unchanged.

A column containing numeric data can be selected as a weights column. Weights are frequency weights and all independent variables are multiplied by this column internally by the program.

An intermediate inputs dialogue is displayed next.

Poisson Regression

Tolerance: This value is used to control the sensitivity of nonlinear minimisation procedure employed. Under normal circumstances, you do not need to edit this value. If convergence cannot be achieved, then larger values of this parameter can be tried by removing one or more zeros.

Maximum Number of Iterations: When convergence cannot be achieved with the default value of 100 function evaluations, a higher value can be tried.

Omit Level: This field will appear only when one or more dummy variables have been included in the model from the Variable Selection Dialogue. Three options are available; (0) do not omit any levels, (1) omit the first level and (2) omit the last level. When no levels are omitted, the model will usually be over-parameterised (see 2.1.4. Creating Interaction, Dummy and Lag/Lead Variables).

Exposure / Offset: This field will appear only if a column has been assigned the task of [Exposure] in Variable Selection Dialogue. When the value of this field 0 the variable selected (E) will enter the model as Exposure:

and for any other value it will enter the model as Offset:

7.2.8.3. Poisson Regression Output Options

The Actual and Fitted Values, Residuals and Confidence Intervals for Mean and Actual Y Values options that existed in earlier versions of UNISTAT have now been merged with the Case (Diagnostic) Statistics option. See 7.2.1.2.2. Linear Regression Case Output.

Poisson Regression

Regression Results: The main regression output displays a table of estimated coefficients for each category of the dependent variable, except for the base category. Standard errors, Wald statistics, probability values and confidence intervals are also displayed for the estimated regression coefficients.

Wald Statistic: This is defined as:

and has a chi-square distribution with one degree of freedom.

Confidence Intervals: The confidence intervals for regression coefficients are computed from:

where k is the number of independent variables in the model and each coefficient’s standard error, σ_i, is the square root of the diagonal element of covariance matrix.

Goodness of Fit Tests: See 7.2.6.4.1. Logistic Regression Results for details.

Correlation Matrix of Regression Coefficients: This is a symmetric matrix with unity diagonal elements. The off-diagonal elements give correlations between the regression coefficients.

Covariance Matrix of Regression Coefficients: This is a symmetric matrix where diagonal elements are the square of parameter standard errors. The off-diagonal elements are covariances between the regression coefficients.

Incidence Rate Ratio: Values of the incidence rate ratio indicate the influence of one unit change in a covariate on the regression.

The standard error of the incidence rate ratio is:

where σ_i is the standard error of the i^th independent variable for the j^th category of the dependent variable. Coefficient confidence intervals are:

which are simply the exponential of the coefficient confidence intervals.

Case (Diagnostic) Statistics: Case statistics are useful to determine the influence of individual observations on the overall fit of the model. For further information see 7.2.1.2.2. Linear Regression Case Output.

Statistics available under this option are defined as follows.

Poisson Regression

Actual Y: Observed values of the dependent variable.

Fitted Y: Also known as expected values:

Standard Error of Fitted:

Confidence Intervals of Fitted:

Deviance:

Poisson Regression

Residuals:

Standardised Residuals:

Plot of Actual and Fitted Values: Select this option to plot actual and fitted Y values against row numbers (index), residuals or against any independent variable. A further dialogue will enable you to choose the X-axis variable from a list.

Poisson Regression

By default, a line graph of the two series is plotted. However, since this procedure (like the plot of residuals) uses the X-Y Plots engine, it has almost all controls and options available for X-Y Plots, except for error bars and right Y-axes.

The data points on the graph will also respond to the right mouse button in the way X-Y Plots does; the point is highlighted, a panel displays information about the point and in Stand-Alone Mode, the row of the spreadsheet containing the data point is also highlighted (a procedure which is also known as Brushing or Point identification). While the point is highlighted you can press <Delete> to omit the particular row containing the point. The entire Regression Analysis will be run again without the deleted row. If you want to restore the original regression, you will need to take one of the following two actions depending on the way you run UNISTAT:

1. In Stand-Alone Mode, go back to the Data Processor and delete or deactivate the Select Row column created by the program.

2. In Excel Add-In Mode, highlight a different block of data to remove the effect of the internal Select Row column.

Plot of Residuals: Residuals can be plotted against row numbers (index), fitted values or against any independent variable. A further dialogue will enable you to choose the X-axis variable from a list containing Row Numbers, Fitted Values and all independent variables.

By default a scatter graph of residuals is plotted. For more information on available options see Plot of Actual and Fitted Values above.

7.2.8.4. Poisson Regression Examples

Example 1

Example 14.4 on p. 501 Armitage & Berry (2002). The aim is to assess whether there is a significant difference in cancer risk between veterans and non-veterans. The servicemen are divided into 11 age groups and their experience is given in terms of subject-years.

Open POISSON and select Statistics 1 → Regression Analysis → Poisson Regression. Select Status and Age group (L1 and L2) as [Dummy], Number of cancers (C3) as [Dependent] and Subject-years (C4) as [Exposure]. On Step 2 dialogue enter 1 for Omit Level and leave other entries unchanged. Uncheck all output options except for Regression Results and click [Finish].

Armitage and Berry include the logarithm of the Subject-years variable in the model as an Offset variable, which is equivalent to an Exposure variable.

Regression results show that the p-value of Status = Veteran variable is 0.9493. As this is much greater than 5%, we can conclude that there is no significant difference in cancer risk between veterans and non-veterans.

Poisson Regression

Dependent Variable: Number of cancers

Exposure: Subject-years

Valid Number of Cases: 22, 0 Omitted

Regression Results

	Coeffi cient	Standard Error	Wald Statistic	Signifi cance	Lower 95%	Upper 95%
Constant	-9.3248	0.2045	2079.0648	0.0000	-9.7257	-8.9240
Status = Veteran	-0.0035	0.0555	0.0040	0.9493	-0.1123	0.1052
Age group = 25-29	0.6793	0.2325	8.5373	0.0035	0.2236	1.1350
30-34	1.3711	0.2177	39.6626	0.0000	0.9444	1.7978
35-39	1.9396	0.2121	83.6581	0.0000	1.5240	2.3553
40-44	2.0343	0.2161	88.6203	0.0000	1.6108	2.4579
45-49	2.7266	0.2222	150.5414	0.0000	2.2910	3.1621
50-54	3.2029	0.2206	210.7121	0.0000	2.7704	3.6353
55-59	3.7162	0.2178	291.1924	0.0000	3.2894	4.1430
60-64	4.0927	0.2177	353.4845	0.0000	3.6660	4.5193
65-69	4.2362	0.2242	356.9375	0.0000	3.7967	4.6757
70-	4.3637	0.2274	368.3262	0.0000	3.9181	4.8094

Example 2

Example 19.21 on p. 932 Greene (1997). The number of accidents per service month is given for a sample of ship types. There are five types of ships constructed in four different time periods and observed in two time periods.

Open POISSON and select Statistics 1 → Regression Analysis → Poisson Regression. From the Variable Selection Dialogue select Type, Constructed and Operated (C5-C7) as [Dummy], Accidents (C9) as [Dependent] and Months (C8) as [Exposure]. On Step 2 dialogue enter 1 for Omit Level and leave other entries unchanged. Check all output options to obtain the following output. Some tables have been shortened to save space.

Poisson Regression

Dependent Variable: Accidents

Exposure: Months

Valid Number of Cases: 34, 6 Omitted

Regression Results

	Coeffi cient	Standard Error	Wald Statistic	Signi ficance	Lower 95%	Upper 95%
Constant	-6.4029	0.2175	866.444	0.0000	-6.8292	-5.9765
Type = Type B	-0.5447	0.1776	9.4055	0.0022	-0.8928	-0.1966
Type C	-0.6888	0.3290	4.3818	0.0363	-1.3337	-0.0439
Type D	-0.0743	0.2906	0.0654	0.7981	-0.6438	0.4952
Type E	0.3205	0.2358	1.8485	0.1740	-0.1415	0.7826
Constructed = C 65-69	0.6958	0.1497	21.6190	0.0000	0.4025	0.9892
C 70-74	0.8175	0.1698	23.1665	0.0000	0.4846	1.1503
C 75-79	0.4450	0.2332	3.6397	0.0564	-0.0122	0.9021
Operated = O 75-79	0.3839	0.1183	10.5357	0.0012	0.1521	0.6156

Goodness of Fit Tests

	-2 Log likelihood
Initial Model	244.1948
Final Model	136.8291

	Chi-Square Statistic	Degrees of Freedom	Right-Tail Probability
Pearson	38.9626	24	0.0276
Likelihood Ratio	107.3657	8	0.0000

	Pseudo R-squared
McFadden	0.4397
Adjusted McFadden	0.3660
Cox & Snell	0.9575
Nagelkerke	0.9582

Correlation Matrix of Regression Coefficients

	Constant	Type = Type B	Type C	Type D	Type E
Constant	1.0000	-0.8115	-0.3783	-0.3706	-0.4680
Type = Type B	-0.8115	1.0000	0.4331	0.4469	0.5698
Type C	-0.3783	0.4331	1.0000	0.2377	0.3132
Type D	-0.3706	0.4469	0.2377	1.0000	0.3349
Type E	-0.4680	0.5698	0.3132	0.3349	1.0000
Constructed = C 65-69	-0.4847	0.0862	0.0360	0.0277	-0.0052
C 70-74	-0.5509	0.2722	0.0458	0.0284	-0.0377
C 75-79	-0.4015	0.2285	0.0968	-0.0962	0.0476
Operated = O 75-79	-0.2164	0.0256	-0.0030	-0.0047	0.0263

	Constructed = C 65-69	C 70-74	C 75-79	Operated = O 75-79
Constant	-0.4847	-0.5509	-0.4015	-0.2164
Type = Type B	0.0862	0.2722	0.2285	0.0256
Type C	0.0360	0.0458	0.0968	-0.0030
Type D	0.0277	0.0284	-0.0962	-0.0047
Type E	-0.0052	-0.0377	0.0476	0.0263
Constructed = C 65-69	1.0000	0.6337	0.4758	-0.1196
C 70-74	0.6337	1.0000	0.5494	-0.2629
C 75-79	0.4758	0.5494	1.0000	-0.3150
Operated = O 75-79	-0.1196	-0.2629	-0.3150	1.0000

Covariance Matrix of Regression Coefficients

	Constant	Type = Type B	Type C	Type D	Type E
Constant	0.0473	-0.0314	-0.0271	-0.0234	-0.0240
Type = Type B	-0.0314	0.0315	0.0253	0.0231	0.0239
Type C	-0.0271	0.0253	0.1083	0.0227	0.0243
Type D	-0.0234	0.0231	0.0227	0.0844	0.0229
Type E	-0.0240	0.0239	0.0243	0.0229	0.0556
Constructed = C 65-69	-0.0158	0.0023	0.0018	0.0012	-0.0002
C 70-74	-0.0204	0.0082	0.0026	0.0014	-0.0015
C 75-79	-0.0204	0.0095	0.0074	-0.0065	0.0026
Operated = O 75-79	-0.0056	0.0005	-0.0001	-0.0002	0.0007

	Constructed = C 65-69	C 70-74	C 75-79	Operated = O 75-79
Constant	-0.0158	-0.0204	-0.0204	-0.0056
Type = Type B	0.0023	0.0082	0.0095	0.0005
Type C	0.0018	0.0026	0.0074	-0.0001
Type D	0.0012	0.0014	-0.0065	-0.0002
Type E	-0.0002	-0.0015	0.0026	0.0007
Constructed = C 65-69	0.0224	0.0161	0.0166	-0.0021
C 70-74	0.0161	0.0288	0.0218	-0.0053
C 75-79	0.0166	0.0218	0.0544	-0.0087
Operated = O 75-79	-0.0021	-0.0053	-0.0087	0.0140

Incidence Rate Ratio

	Incidence Rate Ratio	Standard Error	Lower 95%	Upper 95%
Type = Type B	0.5800	0.1030	0.4095	0.8215
Type C	0.5022	0.1652	0.2635	0.9571
Type D	0.9284	0.2697	0.5253	1.6408
Type E	1.3779	0.3248	0.8680	2.1871
Constructed = C 65-69	2.0054	0.3001	1.4956	2.6890
C 70-74	2.2647	0.3846	1.6235	3.1592
C 75-79	1.5604	0.3640	0.9879	2.4648
Operated = O 75-79	1.4679	0.1736	1.1642	1.8509

Case (Diagnostic) Statistics

	Actual Y	Fitted Y	95% lb Actual Y	95% ub Actual Y	Standard Error of Fitted
1	0.0000	0.2104	-0.2159	0.6367	0.2175
2	0.0000	0.1532	-0.2858	0.5922	0.2240
3	3.0000	3.6382	3.2553	4.0210	0.1953
…	…	…	…	…	…

	Deviance	Residuals	Standardised Residuals
1	0.4208	-0.2104	-0.4587
2	0.3064	-0.1532	-0.3914
3	0.1191	-0.6382	-0.3346
…	…	…	…

Poisson Regression

Previous topic | Next topic