Theory of modified Poisson regression for pooled individual-level data
We begin by describing the general theory of modified Poisson regression in single-database studies. Let X be a vector of covariates, E a binary exposure indicator (E = 1 if exposed and E = 0 if unexposed), and Y the binary outcome variable (Y = 1 if the outcome occurs and Y = 0 otherwise).
Let Z be a vector of information on the exposure and covariates. Specifically, Z = (1, g(E, XT))T where g(E, XT) is a vector containing a specified function of E and X. Assume the risk of the outcome conditional on E and X can be written as
$$ P\left(Y=1|E,\boldsymbol{X}\right)=\exp \left({\boldsymbol{\beta}}^T\boldsymbol{Z}\right) $$
(1)
where β is an unknown vector of parameters. An example of model 1 is \( P\left(Y=1|E,\boldsymbol{X}\right)=\exp \left({\beta}_0+{\beta}_E\ E+{\boldsymbol{\beta}}_{\boldsymbol{X}}^T\boldsymbol{X}\right) \). In this special case, the risk ratio of the outcome comparing the exposed to unexposed and adjusting for covariates X is given by P(Y = 1| E = 1, X)/P(Y = 1| E = 0, X) = exp(βE).
Alternatively, we might assume a more flexible model that allows interactions between E and X: \( P\left(Y=1|\boldsymbol{X},E\right)=\exp \left({\beta}_0+{\beta}_E\ E+{\boldsymbol{\beta}}_{\boldsymbol{X}}^T\boldsymbol{X}+{\boldsymbol{\beta}}_{E\boldsymbol{X}}^TE\boldsymbol{X}\right) \). Under this model, the risk ratio of the outcome comparing the exposed to unexposed and adjusting for covariates X is given by \( P\left(Y=1|E=1,\boldsymbol{X}\right)/P\left(Y=1|E=0,\boldsymbol{X}\right)=\exp \left({\beta}_E+{\boldsymbol{\beta}}_{E\boldsymbol{X}}^T\boldsymbol{X}\right) \), which depends on the value of X.
Suppose we have an independent and identically distributed sample of size n. For each individual i, the following variables are measured: Let Xi be a vector of covariates, Ei a binary exposure indicator (Ei = 1 if exposed and Ei = 0 if unexposed), Yi the binary outcome variable (Yi = 1 if the outcome occurs and Yi = 0 otherwise), and Zi a vector of information on the exposure and covariates, i.e., Zi = (1, g(Ei, XiT))T.
Zou [12] provided the theoretical justification for his proposed approach in the setting of a 2 by 2 table (a binary exposure and no covariates). The justification for this approach can be established more generally using the theory of unbiased estimating equations [20]. Provided model 1 is correctly specified, we have E[{Y − exp(ZTβ)}Z] = 0, which leads to the unbiased estimating equation
$$ \sum \limits_{i=1}^n\left\{{Y}_i-\exp \left({{\boldsymbol{Z}}_i}^T\boldsymbol{\beta} \right)\right\}{\boldsymbol{Z}}_i=\mathbf{0} $$
(2)
Solving (2) for β gives \( \hat{\boldsymbol{\beta}} \), a consistent and asymptotically normal estimator for the true β.
A consistent estimator of the variance of \( \hat{\boldsymbol{\beta}} \) is then given by the sandwich variance estimator [20]
$$ \hat{\mathit{\operatorname{var}}}\left(\hat{\boldsymbol{\beta}}\right)=\left\{\boldsymbol{H}{\left(\hat{\boldsymbol{\beta}}\right)}^{-1}\right\}\boldsymbol{B}\left(\hat{\boldsymbol{\beta}}\right)\left\{\boldsymbol{H}{\left(\hat{\boldsymbol{\beta}}\right)}^{-1}\right\} $$
(3)
where \( \boldsymbol{H}\left(\hat{\boldsymbol{\beta}}\right)=-\sum \limits_{i=1}^n\exp \left({{\boldsymbol{Z}}_i}^T\hat{\boldsymbol{\beta}}\right){\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T \) and \( \boldsymbol{B}\left(\hat{\boldsymbol{\beta}}\right)=\sum \limits_{i=1}^n{\left\{{Y}_i-\exp \left({{\boldsymbol{Z}}_i}^T\hat{\boldsymbol{\beta}}\right)\right\}}^2{\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T \). Zou [12] referred to this procedure as modified Poisson regression because (2) is equivalent to the score equation for the Poisson likelihood but the variance estimator does not rely on the Poisson distribution assumption (clearly unreasonable for binary outcomes).
Distributed algorithm for conducting modified Poisson regression in multi-center studies
Suppose the n individuals’ data are physically stored in K data partners that are unable to share their individual-level data with the analysis center. For k = 1, …, K, let Ωk denote the set of indexes of individuals who are members of the k th data partner.
When the individual-level data are available to the analysis center, the estimator \( \hat{\boldsymbol{\beta}} \) and its corresponding variance estimator can be obtained with off-the-shelf statistical software. However, when the individual-level data are not available, (2) cannot be directly solved for β to obtain \( \hat{\boldsymbol{\beta}} \), and \( \hat{\mathit{\operatorname{var}}}\left(\hat{\boldsymbol{\beta}}\right) \) cannot be directly calculated using (3). Here we describe a distributed algorithm that produces identical \( \hat{\boldsymbol{\beta}} \) and \( \hat{\mathit{\operatorname{var}}}\left(\hat{\boldsymbol{\beta}}\right) \) in multi-center studies where individual-level data are not pooled.
We leverage the Newton-Raphson method such that \( \hat{\boldsymbol{\beta}} \) can be obtained using an iteration-based procedure with only summary-level information being shared between the data partners and the analysis center in each iteration. The r th iterated estimate of β using the Newton-Raphson method is
$$ {\boldsymbol{\beta}}^{\left(\mathrm{r}\right)}={\boldsymbol{\beta}}^{\left(\mathrm{r}-1\right)}-{\left\{{\boldsymbol{H}}^{(r)}\right\}}^{-1}{\boldsymbol{S}}^{(r)} $$
(4)
where β(r − 1) is the (r − 1) th iterated estimate of β, \( {\boldsymbol{S}}^{(r)}=\sum \limits_{i=1}^n\left\{{Y}_i-\exp \left({{\boldsymbol{Z}}_i}^T{\boldsymbol{\beta}}^{\left(\mathrm{r}-1\right)}\right)\right\}{\boldsymbol{Z}}_i \), and \( {\boldsymbol{H}}^{(r)}=-\sum \limits_{i=1}^n\exp \left({{\boldsymbol{Z}}_i}^T{\boldsymbol{\beta}}^{\left(\mathrm{r}-1\right)}\right){\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T. \)
We observe that S(r) and H(r) can be re-written as summation of site-specific quantities:
$$ {\boldsymbol{S}}^{(r)}=\sum \limits_{k=1}^K{{\boldsymbol{S}}_k}^{(r)} $$
(5)
where
$$ {{\boldsymbol{S}}_k}^{(r)}=\sum \limits_{i\text{\EUR} {\varOmega}_k}\left\{{Y}_i-\exp \left({{\boldsymbol{Z}}_i}^T{\boldsymbol{\beta}}^{\left(\mathrm{r}-1\right)}\right)\right\}{\boldsymbol{Z}}_i $$
(6)
and
$$ {\boldsymbol{H}}^{(r)}=\sum \limits_{k=1}^K{{\boldsymbol{H}}_k}^{(r)} $$
(7)
where
$$ {{\boldsymbol{H}}_k}^{(r)}=-\sum \limits_{i\text{\EUR} {\varOmega}_k}\exp \left({{\boldsymbol{Z}}_i}^T{\boldsymbol{\beta}}^{\left(\mathrm{r}-1\right)}\right){\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T $$
(8)
Therefore, to calculate S(r) and H(r), each data partner k = 1, …, K only needs to calculate and share with the analysis center the summary-level information Sk(r) and Hk(r).
Next, consider estimation of the variance of \( \hat{\boldsymbol{\beta}} \). We observe that
$$ \boldsymbol{H}\left(\hat{\boldsymbol{\beta}}\right)=\sum \limits_{k=1}^K{\boldsymbol{H}}_k\left(\hat{\boldsymbol{\beta}}\right) $$
(9)
where
$$ {\boldsymbol{H}}_k\left(\hat{\boldsymbol{\beta}}\right)=-\sum \limits_{i\text{\EUR} {\varOmega}_k}\exp \left({{\boldsymbol{Z}}_i}^T\hat{\boldsymbol{\beta}}\right){\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T $$
(10)
and
$$ \boldsymbol{B}\left(\hat{\boldsymbol{\beta}}\right)=\sum \limits_{k=1}^K{\boldsymbol{B}}_k\left(\hat{\boldsymbol{\beta}}\right) $$
(11)
where
$$ {\boldsymbol{B}}_k\left(\hat{\boldsymbol{\beta}}\right)=\sum \limits_{i\text{\EUR} {\varOmega}_k}{\left\{{Y}_i-\exp \left({{\boldsymbol{Z}}_i}^T\hat{\boldsymbol{\beta}}\right)\right\}}^2{\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T $$
(12)
To calculate \( \boldsymbol{H}\left(\hat{\boldsymbol{\beta}}\right) \) and \( \boldsymbol{B}\left(\hat{\boldsymbol{\beta}}\right) \), each data partner k = 1, …, K only needs to calculate and share the summary-level information \( {\boldsymbol{H}}_k\left(\hat{\boldsymbol{\beta}}\right) \) and \( {\boldsymbol{B}}_k\left(\hat{\boldsymbol{\beta}}\right) \) after receiving the value of \( \hat{\boldsymbol{\beta}} \) from the analysis center. Unlike in the estimation of β, no iterations are needed in the sandwich variance estimation.
We summarize our distributed algorithm for conducting modified Poisson regression in multi-center studies below.
Point estimation
Step 0 (Determination of starting values)
The analysis center specifies the starting values for the components of β(0) and sends these values to all data partners. Then, for each iteration r until the convergence criteria are met, the following two steps are repeated:
Step 1 (r th iteration of data partners)
Each data partner k = 1, …, K calculates Sk(r) and Hk(r) using (6) and (8), respectively, based on β(r − 1) received from the analysis center. All data partners then share the values of Sk(r) and Hk(r) with the analysis center.
Step 2 (r th iteration of the analysis center)
The analysis center calculates S(r) and H(r) using (5) and (7), respectively. The analysis center then calculates β(r) using (4) and shares the value of β(r) with all data partners.
The iteration procedure is considered to have converged when the change in the estimates between iterations is within a user-specified tolerance value. In numerical studies to be presented later, we considered a convergence criterion to be met at the (R + 1) th iteration if \( \underset{l}{\max}\left|{\delta_l}^{\left(\mathrm{R}+1\right)}\right|<{10}^{-8} \), where δl(R + 1) = βl(R + 1) − βl(R) if ∣βl(R) ∣ < 0.01 and δl(R + 1) = (βl(R + 1) − βl(R))/βl(R) otherwise, and βl(R) is the l th element of β(R). Once achieving convergence, the analysis center shares the final estimate \( \hat{\boldsymbol{\beta}} \) with all data partners.
Variance estimation
Step 1 (Calculation of summary-level information by data partners)
Each data partner k = 1, …, K calculates \( {\boldsymbol{H}}_k\left(\hat{\boldsymbol{\beta}}\right) \) and \( {\boldsymbol{B}}_k\left(\hat{\boldsymbol{\beta}}\right) \) using (10) and (12), respectively, and then shares the values of \( {\boldsymbol{H}}_k\left(\hat{\boldsymbol{\beta}}\right) \) and \( {\boldsymbol{B}}_k\left(\hat{\boldsymbol{\beta}}\right) \) with the analysis center.
Step 2 (Calculation of the variance estimate by the analysis center)
The analysis center calculates \( \boldsymbol{H}\left(\hat{\boldsymbol{\beta}}\right) \) and \( \boldsymbol{B}\left(\hat{\boldsymbol{\beta}}\right) \) using (9) and (11), respectively, and then calculates the estimated variance \( \hat{\mathit{\operatorname{var}}}\left(\hat{\boldsymbol{\beta}}\right) \) using (3).
Due to mathematical equivalence, the above procedure would provide the same point estimates and sandwich variance estimates as the analysis that uses individual-level data pooled across data partners.