TEXT SIZE

search for



CrossRef (0)
Letter to the editor: Discussion of proposed -statistic in “ppcor: An R Package for a fast calculation to semi-partial correlation coefficients,” CSAM 2015; 22:665–674
Communications for Statistical Applications and Methods 2022;29:393-396
Published online May 31, 2022
© 2022 Korean Statistical Society.

Anthony Britto1,a

aChair of Energy Economics, Karlsruhe Institute of Technology, Germany
Correspondence to: 1 Chair of Energy Economics, Karlsruhe Institute of Technology, Hertzstr. 16 Building 06.33, Karlsruhe 76187, Germany. E-mail: anthony.britto@kit.edu
Received November 22, 2021; Revised January 12, 2022; Accepted January 13, 2022.
1. Letter

Dear Prof. Seongjoo Song,

This letter concerns the article ppcor: An R Package for a fast calculation to semi-partial correlation coefficients by Prof. Seongho Kim, which appears in the November 2015 issue of Communications for Statistical Applications and MethodsKim (2015). I believe that there is a discrepancy which merits a detailed discussion between the t-statistic of the semi-partial correlation coefficient proposed by Prof. Kim, and the one proposed by Cohen et al. (2002).

Equation (2.8) in Prof. Kim’s article lists the t-statistics of the partial and semi-partial correlation coefficients respectively as

tijS=rijSn-2-g1-rijS2,ti(jS)=ri(jS)n-2-g1-ri(jS)2,

where ri j|S (resp. ri(j|S)) is the partial (resp. semi-partial) correlation coefficient of the random variables xi and xj with g covariates. Each of these variables are in the vector of random variables X = (x1, x2, . . ., xn)T; XS subsequently denotes the random sub-vector of X that results from deleting xi and xj.

An immediate consequence of the above formulae, which are identical as functions of ri j|S and ri(j|S), is that since ri j|Sri(j|S) in general, the respective t-statistics will also not agree: ti j|Sti(j|S) (We have ri j|S = ri(j|S) only in the trivial case where each of the g covariates has zero correlation with xi; cf. equations (2.1) and (2.2) in Kim (2015)). However, Cohen et al. (2002) argue that since the partial and semi-partial correlation coefficients are different scalings of the same statistical phenomenon, they must yield an identical t-statistic. In fact, they claim further that the t-statistic of the partial and semi-partial correlation must also equal the one yielded by the βi resulting from a multiple linear regression of xi on xj together with the other g covariates.

Their argument is as follows. Consider the case of a random variable y regressed on two predictors x1 and x2, where all variables have been standardised; the three effect sizes in the previous paragraph for the predictor x1 for instance are then given by

β1=ry1-ry2r121-r122,ry12=ry1-ry2r121-ry221-r122,ry(12)=ry1-ry2r121-r122.

Since these effects differ only with regard to their denominators, “none can equal zero unless the others are also zero, so it is also not surprising that they must yield the same ti value t-statistic for the statistical significance of their departure from zero” (Cohen et al., 2002).

The respective formulae for the t-statistics of these quantities are again found in Cohen et al. (2002) and elsewhere in the literature, for instance in the review article by Aloe and Thompson (2013). I reproduce here the formulae for the partial and semi-partial correlations (the formula for the t-statistic of the βi is presented in the appendix), reverting now to the setup and notation of Prof. Kim above:

tijS=rijSdf1-rijS2,ti(jS)=ri(jS)df1-RijS2,

where df is the degrees of freedom, and RijS2 the coefficient of determination of the linear model where xi is regressed on xj and other covariates,

x˜i=βjx˜j+xkXgβkx˜k

for XgX the set of g covariates. The tilde indicates that the variables have been standardised; since this means that the intercept vanishes, we have

df=-n-g-2.

Comparing equations (1.6) and (1.7) with equations (1.1) and (1.2), we see that the sole difference to Prof. Kim’s article is in equation (1.2), namely, that ri(jS)2 would need to be replaced with RijS2.

A trivial consequence of this, but one worth noting, is that such a change would automatically simplify equation (2.9) in Prof. Kim’s article: since ti j|S = ti(j|S), the partial and semi-partial correlation coefficients would then have identical p-values:

pijS=2Φ(-|tijS|,df)=2Φ(-|ti(jS)|,df)=pi(jS),

where Φ(·) is the cumulative density function of a Student’s t distribution with degrees of freedom df as in equation (1.9) above.

I present an example calculation within the framework of Cohen et al. (2002) in the appendix, demonstrating that their formulae do in fact work as claimed. I extend my gratitude to Prof. Kim for his Rpackage, and look forward to a productive discussion on this discrepancy.

Sincerely,

Anthony Britto

Appendix: Example calculation: Duncan’s occupational prestige data

I demonstrate here that the formulae of Cohen et al. (2002) work as claimed; I employ a standard, freely-available data set, Duncan’s Occupational Prestige Data, available in the carDatapackage in RFox et al (2020). The data quantifies the prestige and other characteristics of 45 U.S. occupations in 1950; descriptions of the three columns used in the analysis are found in Table A.1.

The correlation matrix of the three variables is computed as follows:

income education prestige
income 1.0000 0.7245 0.8378
education 0.7245 1.0000 0.8519
prestige 0.8378 0.8519 1.0000

From this matrix, it is possible to directly compute the β’s and the partial and semi-partial correlations using equations (1.3) through (1.5) above, where I set income to be the y variable and education and prestige to be x1 and x2 respectively (see table below).

The t-statistics of the partial and semi-partial correlations can then be computed from equations (1.6) and (1.7); for equation (1.7), we need the coefficient of determination of the model where y is regressed on x1 and x2. Using the method of least squares, this is found to be R y 2 = 0.7023 ( df = 45 - 2 = 43 ). As claimed by Cohen et al. (2002), the t-statistics of the partial and semi-partial correlation coefficients agree; for the variable prestige for instance, we see that

t y 2 1 = r y 2 1 df 1 - r y 2 1 2 = 0.6111 43 1 - 0.6111 2 = 5.0625 0.4212 43 1 - 0.7023 = r y ( 2 1 ) df 1 - R y 2 = t y ( 2 1 ) .

This should be contrasted with what occurs if the Prof. Kim’s expression, equation (1.2), is employed:

t y ( 2 1 ) = r i ( j S ) n - 2 - g 1 - r i ( j S ) 2 = 0.4212 43 1 - 0.4212 = 3.6304.

Finally, for the t-statistic of the βi we have the usual formula

t ( β i ) = β i s.e. ( β i ) ,

where s.e.(βi) is the standard error of the estimate Cohen et al. (2002),

s.e. ( β i ) = 1 - R y 2 df ( 1 - R i J 2 ) .

Here, R y 2 is as above, and R i J 2 is the coefficient of determination of the model where the ith predictor is regressed on all other predictors. In the case of a model with just two predictors, R i j 2 = r i j 2, and the standard errors of the coefficients are identical: 0.1589 in this case. Hence, from equations A.3, and then 1.10, I finally arrive at the following coefficient table, which neatly summarises that these effect sizes exist on different scales, but nevertheless share their statistical significance.

βi ryi|j ry(i|j) t–stat. p–value
education 0.0393 0.0377 0.0206 0.2473 0.8058
prestige 0.8043 0.6111 0.4212 5.0625 0.0000
TABLES

Table A.1

Description of variables in Duncan’s dataset

income Percentage of occupational incumbents in the 1950 US Census who earned $3,500 or more per year (about $36,000 in 2017 US dollars).
education Percentage of occupational incumbents in 1950 who Ire high school graduates.
prestige Percentage of respondents in a social survey who rated the occupation as “good” or better in prestige.

References
  1. Aloe A and Thompson C (2013). The synthesis of partial effect sizes. Journal of the Society for Social Work and Research, 4.
    CrossRef
  2. Cohen J, Cohen P, West S, and Aiken L (2002). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, (pp. 64-101) (3rd ed), Lawrence Erlbaum Associates, Inc.
  3. Fox J, Weisberg S, and Price B (2020). carData: Companion to applied regression data sets, R package version 3.0–4.
  4. Kim SH (2015). ppcor: An Package for a fast calculation to semi-partial correlation coefficients. Communications for Statistical Applications and Methods, 22, 665-674.
    CrossRef
  5. Vallat R (2018). Pingouin: Statistics in Python. Journal of Open Source Software, 3.
    CrossRef