Unistat Statistics Software | Correlation Coefficients

6.2.1. Correlation Coefficients

Four Correlation Coefficients (Pearson product moment, Spearman rank, Kendall rank and point biserial) can be accessed under this menu item and the results presented in a single page of output.

Correlation Coefficients

Two or more columns can be selected by clicking on [Variable]. Correlations will be computed between all possible pairs, as long as the two columns have the same size. For each test, any pair of cases with one or more missing values is omitted and the degrees of freedom adjusted. Output Options Dialogue will allow you to choose which tests to appear in the output.

If a factor column is selected, then it is assumed that the data is not paired and only the point serial correlation is computed.

6.2.1.1. Pearson Product Moment Correlation

The aim of this correlation coefficient is to establish the degree of linear relationship between two variables. The coefficient is defined as the covariance of the two samples divided by the product of their standard deviations.

The probability value is based on Student’s t-distribution, where the t-statistic is calculated as:

This correlation coefficient is a relatively poor measure of association since it does not take into consideration the individual distributions of the two variables. The effect of outliers may be considerable. This makes it difficult to conclude that one linear correlation is significantly better than another. The nonparametric Correlation Coefficients Spearman’s rho and Kendall’s tau are more robust measures.

Pairs with one or more missing values are omitted and the degrees of freedom is adjusted. The output includes the correlation coefficient, its confidence interval, t-statistic, degrees of freedom and one- and two-tailed probabilities.

Example

Table 8.5 on p. 89, Gardner & Altman (2000). The null hypothesis “basal metabolic rate and total energy expenditure are not correlated” is tested at 95% confidence level.

Open CORRCOEF, select Statistics 1 → Correlation Coefficients, select Basal and Energy (C1 and C2) as [Variable]s, select all output options (including the Report summary statistics box) and click [Next] to obtain the following results:

Correlation Coefficients

For Basal and Energy

	Valid Cases	Missing	Mean	Standard Deviation
Basal	13	0	5.6515	0.4650
Energy	13	0	8.0662	1.2381
Paired	13	0

	Correlation Coefficient	Degrees of Freedom	* Test Statistic	1-Tail Probability	2-Tail Probability
Pearson	0.7283	11	3.5249	0.0024	0.0048
Spearman Rank	0.6190	11	2.6139	0.0120	0.0241
Kendall Rank	0.4258		2.0171	0.0218	0.0437
Kendall Rank with CC	0.4387		2.0782	0.0188	0.0377
Point Biserial (sample SD)	-0.7866	24	-6.2419	0.0000	0.0000
Point Biserial (pop SD)	-0.8022	24	-6.5828	0.0000	0.0000

	Lower 95%	Upper 95%
Pearson	0.2961	0.9129
Spearman Rank	0.1032	0.8724
Kendall Rank	-0.1635	0.7912
Kendall Rank with CC	-0.1481	0.7970
Point Biserial (sample SD)	-0.8998	-0.5743
Point Biserial (pop SD)	-0.9076	-0.6019

* Z-statistic for Kendall rank, t-statistic otherwise

This result shows that there is a significant correlation between the two variables.

6.2.1.2. Spearman’s Rank Correlation

Correlation between relative rankings of the two variables is measured rather than their nominal values. In this way each variable is transformed into a uniformly distributed variable and the effect of outliers is minimised. Spearman’s correlation coefficient (also called rho) is calculated as follows:

Correlation Coefficients

where R is the sum of squared differences between the ranks of corresponding cases of the two variables and:

where Kx and Ky are the sum of k3 – k where k is the number of ties at a given rank within each variable. The tail probability of rho is determined by comparing the following t-statistic with the Student’s t distribution:

Pairs with at least one missing value are omitted and the degrees of freedom is adjusted. The output includes the correlation coefficient, its confidence interval, t-statistic, degrees of freedom and one- and two-tailed probabilities.

Example

Example 19.13 on p. 401 from Zar, J. H. (2010). The null hypothesis “there is no correlation between the ranks of values in the two variables” is tested.

Open CORRCOEF, select Statistics 1 → Correlation Coefficients. Select X and Y (C3 and C4) as [Variable]s and select only the Spearman Rank output option to obtain the following results:

Correlation Coefficients

For X and Y

	Correlation Coefficient	Degrees of Freedom	* Test Statistic	1-Tail Probability	2-Tail Probability
Spearman Rank	0.8511	10	5.1261	0.0002	0.0004

	Lower 95%	Upper 95%
Spearman Rank	0.5418	0.9574

* Z-statistic for Kendall rank, t-statistic otherwise

This result shows that there is a significant rank correlation and the null hypothesis should be rejected. Note that the denominator evaluates to 240, not 242 as in the book.

6.2.1.3. Kendall’s Rank Correlation

Like Spearman’s rho this is also a rank correlation coefficient (also called tau) and as such it has the same advantage over Pearson Product Moment Correlation. Additionally, it provides a more robust nonparametric measure by comparing the relative ordering of ranks rather than their numeric difference as in the case of Spearman’s rho. Kendall’s tau is calculated as:

Correlation Coefficients

where R is the number of times a case is greater than other cases in both variables summed over all cases, and Kx and Ky are the sum of k2 – k where k is the number of ties at a given rank within each variable. For tau with continuity correction R is augmented by one.

The tail probability of tau is determined from the normal distribution with a standard deviation:

where:

· Px = sum of (k2 – k)(k – 2) for X

· Py = sum of (k2 – k)(k – 2) for Y

· Qx = sum of (k2 – k)(2k + 5) for X

· Qy = sum of (k2 – k)(2k + 5) for Y

· J = n2 – n.

Example

Table 56 on p. 160 from Cohen, L. & M. Holliday (1983). Ten trainees on a management course have been rated on a personality measure Introversion and on an Attitude to Change scale. The null hypothesis “there is no correlation between these two rankings” is tested.

Open CORRCOEF and select Statistics 1 → Correlation Coefficients. Select Introversion and Attitude (C5 and C6) as variables and select only the Kendall Rank output option to obtain the following results:

Correlation Coefficients

For Introversion and Attitude

	Correlation Coefficient	Degrees of Freedom	* Test Statistic	1-Tail Probability	2-Tail Probability
Kendall Rank	0.6286		2.4545	0.0071	0.0141
Kendall Rank with CC	0.6519		2.5455	0.0055	0.0109

	Lower 95%	Upper 95%
Kendall Rank	-0.0017	0.9014
Kendall Rank with CC	0.0377	0.9086

* Z-statistic for Kendall rank, t-statistic otherwise

This result shows that there is a significant rank correlation at the 1% level, between the Introversion / extraversion rating and the Attitude to Change rating.

6.2.1.4. Point Biserial Correlation

This is an alternative to the linear (Pearson’s) correlation coefficient when the first variable is continuous and the second variable is binary. Let n_p and n_q be the respective numbers of Ps and Qs and n the total number of valid cases. There are two alternative ways of calculating the coefficient:

Using sample standard deviation:

Correlation Coefficients

where SD_samp is the sample standard deviation of the two samples combined:

Using population standard deviation:

Correlation Coefficients

where SD_pop is the population standard deviation of the two samples combined:

Before this release of UNISTAT, the version with sample standard deviation was used.

In both cases, the following t-value is compared with the t-distribution:

The data for this test can be in one of the three types supported for Two Sample Tests. If the last data option Test Statistics are Given is selected the program will prompt for sizes, means and standard deviations of the two samples. Missing values are omitted by case and the degrees of freedom is adjusted accordingly.

Example 1: Point biserial correlation using sample standard deviation

Table 57 on p. 164 from Cohen, L. & M. Holliday (1983). Examination scores of on and off campus social work students is given in one column of the table and their residence pattern in a second column.

Open CORRCOEF and select Statistics 1 → Correlation Coefficients. Select Score (C7) as [Variable] and Off Campus (C8) as [Factor], and select the Point Biserial and Report Summary Statistics output options to obtain the following results:

Correlation Coefficients

Data variable: Score

Subsample selected by: Off Campus = 0,1

	Valid Cases	Missing	Mean	Standard Deviation
0	6	0	82.3333	5.1251
1	4	0	65.0000	4.0825

	Correlation Coefficient	Degrees of Freedom	* Test Statistic	1-Tail Probability	2-Tail Probability
Point Biserial	0.8480	8	4.5260	0.0010	0.0019

	Lower 95%	Upper 95%
Point Biserial	0.4686	0.9633

* Z-statistic for Kendall rank, t-statistic otherwise

This result shows that there is a significant correlation at the 0.1% level between examinations scores and residence.

Example 2: Point biserial correlation using population standard deviation

Example 19.16 on p. 410 from Zar, J. H. (2010). The null hypothesis that there is no correlation between blood-clotting time and drug is tested.

Open CORRCOEF and select Statistics 1 → Correlation Coefficients. Select X1 (C12) as [Factor] and Y1 (C13) as [Variable], and select only the Point Biserial output option to obtain the following results:

Correlation Coefficients

Data variable: Y1

Subsample selected by: X1 = 0,1

	Correlation Coefficient	Degrees of Freedom	* Test Statistic	1-Tail Probability	2-Tail Probability
Point Biserial	-0.5983	11	-2.4765	0.0154	0.0308

	Lower 95%	Upper 95%
Point Biserial	-0.8643	-0.0706

* Z-statistic for Kendall rank, t-statistic otherwise

Since P < 0.05, we reject the null hypothesis.

Previous topic | Next topic