### Theory of modified Poisson regression for pooled individual-level data

We begin by describing the general theory of modified Poisson regression in single-database studies. Let *X* be a vector of covariates, *E* a binary exposure indicator (*E* = 1 if exposed and *E* = 0 if unexposed), and *Y* the binary outcome variable (*Y* = 1 if the outcome occurs and *Y* = 0 otherwise).

Let *Z* be a vector of information on the exposure and covariates. Specifically, *Z =* (1, *g*(*E*, *X*^{T}))^{T} where *g*(*E*, *X*^{T}) is a vector containing a specified function of *E* and *X*. Assume the risk of the outcome conditional on *E* and *X* can be written as

$$ P\left(Y=1|E,\boldsymbol{X}\right)=\exp \left({\boldsymbol{\beta}}^T\boldsymbol{Z}\right) $$

(1)

where *β* is an unknown vector of parameters. An example of model 1 is \( P\left(Y=1|E,\boldsymbol{X}\right)=\exp \left({\beta}_0+{\beta}_E\ E+{\boldsymbol{\beta}}_{\boldsymbol{X}}^T\boldsymbol{X}\right) \). In this special case, the risk ratio of the outcome comparing the exposed to unexposed and adjusting for covariates *X* is given by *P*(*Y* = 1| *E* = 1, *X*)/*P*(*Y* = 1| *E* = 0, *X*) = exp(*β*_{E}).

Alternatively, we might assume a more flexible model that allows interactions between *E* and *X*: \( P\left(Y=1|\boldsymbol{X},E\right)=\exp \left({\beta}_0+{\beta}_E\ E+{\boldsymbol{\beta}}_{\boldsymbol{X}}^T\boldsymbol{X}+{\boldsymbol{\beta}}_{E\boldsymbol{X}}^TE\boldsymbol{X}\right) \). Under this model, the risk ratio of the outcome comparing the exposed to unexposed and adjusting for covariates *X* is given by \( P\left(Y=1|E=1,\boldsymbol{X}\right)/P\left(Y=1|E=0,\boldsymbol{X}\right)=\exp \left({\beta}_E+{\boldsymbol{\beta}}_{E\boldsymbol{X}}^T\boldsymbol{X}\right) \), which depends on the value of *X*.

Suppose we have an independent and identically distributed sample of size *n*. For each individual *i*, the following variables are measured: Let *X*_{i} be a vector of covariates, *E*_{i} a binary exposure indicator (*E*_{i} = 1 if exposed and *E*_{i} = 0 if unexposed), *Y*_{i} the binary outcome variable (*Y*_{i} = 1 if the outcome occurs and *Y*_{i} = 0 otherwise), and *Z*_{i} a vector of information on the exposure and covariates, i.e., *Z*_{i} *=* (1, *g*(*E*_{i}, *X*_{i}^{T}))^{T}**.**

Zou [12] provided the theoretical justification for his proposed approach in the setting of a 2 by 2 table (a binary exposure and no covariates). The justification for this approach can be established more generally using the theory of unbiased estimating equations [20]. Provided model 1 is correctly specified, we have *E*[{*Y* − exp(*Z*^{T}*β*)}*Z*] = **0**, which leads to the unbiased estimating equation

$$ \sum \limits_{i=1}^n\left\{{Y}_i-\exp \left({{\boldsymbol{Z}}_i}^T\boldsymbol{\beta} \right)\right\}{\boldsymbol{Z}}_i=\mathbf{0} $$

(2)

Solving (2) for *β* gives \( \hat{\boldsymbol{\beta}} \), a consistent and asymptotically normal estimator for the true *β*.

A consistent estimator of the variance of \( \hat{\boldsymbol{\beta}} \) is then given by the sandwich variance estimator [20]

$$ \hat{\mathit{\operatorname{var}}}\left(\hat{\boldsymbol{\beta}}\right)=\left\{\boldsymbol{H}{\left(\hat{\boldsymbol{\beta}}\right)}^{-1}\right\}\boldsymbol{B}\left(\hat{\boldsymbol{\beta}}\right)\left\{\boldsymbol{H}{\left(\hat{\boldsymbol{\beta}}\right)}^{-1}\right\} $$

(3)

where \( \boldsymbol{H}\left(\hat{\boldsymbol{\beta}}\right)=-\sum \limits_{i=1}^n\exp \left({{\boldsymbol{Z}}_i}^T\hat{\boldsymbol{\beta}}\right){\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T \) and \( \boldsymbol{B}\left(\hat{\boldsymbol{\beta}}\right)=\sum \limits_{i=1}^n{\left\{{Y}_i-\exp \left({{\boldsymbol{Z}}_i}^T\hat{\boldsymbol{\beta}}\right)\right\}}^2{\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T \). Zou [12] referred to this procedure as *modified Poisson regression* because (2) is equivalent to the score equation for the Poisson likelihood but the variance estimator does not rely on the Poisson distribution assumption (clearly unreasonable for binary outcomes).

### Distributed algorithm for conducting modified Poisson regression in multi-center studies

Suppose the *n* individuals’ data are physically stored in *K* data partners that are unable to share their individual-level data with the analysis center. For *k* = 1, …, *K*, let *Ω*_{k} denote the set of indexes of individuals who are members of the *k* ^{th} data partner.

When the individual-level data are available to the analysis center, the estimator \( \hat{\boldsymbol{\beta}} \) and its corresponding variance estimator can be obtained with off-the-shelf statistical software. However, when the individual-level data are not available, (2) cannot be directly solved for *β* to obtain \( \hat{\boldsymbol{\beta}} \), and \( \hat{\mathit{\operatorname{var}}}\left(\hat{\boldsymbol{\beta}}\right) \) cannot be directly calculated using (3). Here we describe a distributed algorithm that produces identical \( \hat{\boldsymbol{\beta}} \) and \( \hat{\mathit{\operatorname{var}}}\left(\hat{\boldsymbol{\beta}}\right) \) in multi-center studies where individual-level data are not pooled.

We leverage the Newton-Raphson method such that \( \hat{\boldsymbol{\beta}} \) can be obtained using an iteration-based procedure with only summary-level information being shared between the data partners and the analysis center in each iteration. The *r* ^{th} iterated estimate of *β* using the Newton-Raphson method is

$$ {\boldsymbol{\beta}}^{\left(\mathrm{r}\right)}={\boldsymbol{\beta}}^{\left(\mathrm{r}-1\right)}-{\left\{{\boldsymbol{H}}^{(r)}\right\}}^{-1}{\boldsymbol{S}}^{(r)} $$

(4)

where *β*^{(r − 1)} is the (*r* − 1) ^{th} iterated estimate of *β*, \( {\boldsymbol{S}}^{(r)}=\sum \limits_{i=1}^n\left\{{Y}_i-\exp \left({{\boldsymbol{Z}}_i}^T{\boldsymbol{\beta}}^{\left(\mathrm{r}-1\right)}\right)\right\}{\boldsymbol{Z}}_i \), and \( {\boldsymbol{H}}^{(r)}=-\sum \limits_{i=1}^n\exp \left({{\boldsymbol{Z}}_i}^T{\boldsymbol{\beta}}^{\left(\mathrm{r}-1\right)}\right){\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T. \)

We observe that *S*^{(r)} and *H*^{(r)} can be re-written as summation of site-specific quantities:

$$ {\boldsymbol{S}}^{(r)}=\sum \limits_{k=1}^K{{\boldsymbol{S}}_k}^{(r)} $$

(5)

where

$$ {{\boldsymbol{S}}_k}^{(r)}=\sum \limits_{i\text{\EUR} {\varOmega}_k}\left\{{Y}_i-\exp \left({{\boldsymbol{Z}}_i}^T{\boldsymbol{\beta}}^{\left(\mathrm{r}-1\right)}\right)\right\}{\boldsymbol{Z}}_i $$

(6)

and

$$ {\boldsymbol{H}}^{(r)}=\sum \limits_{k=1}^K{{\boldsymbol{H}}_k}^{(r)} $$

(7)

where

$$ {{\boldsymbol{H}}_k}^{(r)}=-\sum \limits_{i\text{\EUR} {\varOmega}_k}\exp \left({{\boldsymbol{Z}}_i}^T{\boldsymbol{\beta}}^{\left(\mathrm{r}-1\right)}\right){\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T $$

(8)

Therefore, to calculate *S*^{(r)} and *H*^{(r)}, each data partner *k* = 1, …, *K* only needs to calculate and share with the analysis center the summary-level information *S*_{k}^{(r)} and *H*_{k}^{(r)}.

Next, consider estimation of the variance of \( \hat{\boldsymbol{\beta}} \). We observe that

$$ \boldsymbol{H}\left(\hat{\boldsymbol{\beta}}\right)=\sum \limits_{k=1}^K{\boldsymbol{H}}_k\left(\hat{\boldsymbol{\beta}}\right) $$

(9)

where

$$ {\boldsymbol{H}}_k\left(\hat{\boldsymbol{\beta}}\right)=-\sum \limits_{i\text{\EUR} {\varOmega}_k}\exp \left({{\boldsymbol{Z}}_i}^T\hat{\boldsymbol{\beta}}\right){\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T $$

(10)

and

$$ \boldsymbol{B}\left(\hat{\boldsymbol{\beta}}\right)=\sum \limits_{k=1}^K{\boldsymbol{B}}_k\left(\hat{\boldsymbol{\beta}}\right) $$

(11)

where

$$ {\boldsymbol{B}}_k\left(\hat{\boldsymbol{\beta}}\right)=\sum \limits_{i\text{\EUR} {\varOmega}_k}{\left\{{Y}_i-\exp \left({{\boldsymbol{Z}}_i}^T\hat{\boldsymbol{\beta}}\right)\right\}}^2{\boldsymbol{Z}}_i{{\boldsymbol{Z}}_i}^T $$

(12)

To calculate \( \boldsymbol{H}\left(\hat{\boldsymbol{\beta}}\right) \) and \( \boldsymbol{B}\left(\hat{\boldsymbol{\beta}}\right) \), each data partner *k* = 1, …, *K* only needs to calculate and share the summary-level information \( {\boldsymbol{H}}_k\left(\hat{\boldsymbol{\beta}}\right) \) and \( {\boldsymbol{B}}_k\left(\hat{\boldsymbol{\beta}}\right) \) after receiving the value of \( \hat{\boldsymbol{\beta}} \) from the analysis center. Unlike in the estimation of *β*, no iterations are needed in the sandwich variance estimation.

We summarize our distributed algorithm for conducting modified Poisson regression in multi-center studies below.

#### Point estimation

### Step 0 (Determination of starting values)

The analysis center specifies the starting values for the components of *β*^{(0)} and sends these values to all data partners. Then, for each iteration *r* until the convergence criteria are met, the following two steps are repeated:

### Step 1 (*r* ^{th} iteration of data partners)

Each data partner *k* = 1, …, *K* calculates *S*_{k}^{(r)} and *H*_{k}^{(r)} using (6) and (8), respectively, based on *β*^{(r − 1)} received from the analysis center. All data partners then share the values of *S*_{k}^{(r)} and *H*_{k}^{(r)} with the analysis center.

### Step 2 (*r* ^{th} iteration of the analysis center)

The analysis center calculates *S*^{(r)} and *H*^{(r)} using (5) and (7), respectively. The analysis center then calculates *β*^{(r)} using (4) and shares the value of *β*^{(r)} with all data partners.

The iteration procedure is considered to have converged when the change in the estimates between iterations is within a user-specified tolerance value. In numerical studies to be presented later, we considered a convergence criterion to be met at the (*R* + 1) ^{th} iteration if \( \underset{l}{\max}\left|{\delta_l}^{\left(\mathrm{R}+1\right)}\right|<{10}^{-8} \), where *δ*_{l}^{(R + 1)} = *β*_{l}^{(R + 1)} − *β*_{l}^{(R)} if ∣*β*_{l}^{(R)} ∣ < 0.01 and *δ*_{l}^{(R + 1)} *=* (*β*_{l}^{(R + 1)} − *β*_{l}^{(R)})/*β*_{l}^{(R)} otherwise, and *β*_{l}^{(R)} is the *l* ^{th} element of *β*^{(R)}. Once achieving convergence, the analysis center shares the final estimate \( \hat{\boldsymbol{\beta}} \) with all data partners.

#### Variance estimation

### Step 1 (Calculation of summary-level information by data partners)

Each data partner *k* = 1, …, *K* calculates \( {\boldsymbol{H}}_k\left(\hat{\boldsymbol{\beta}}\right) \) and \( {\boldsymbol{B}}_k\left(\hat{\boldsymbol{\beta}}\right) \) using (10) and (12), respectively, and then shares the values of \( {\boldsymbol{H}}_k\left(\hat{\boldsymbol{\beta}}\right) \) and \( {\boldsymbol{B}}_k\left(\hat{\boldsymbol{\beta}}\right) \) with the analysis center.

### Step 2 (Calculation of the variance estimate by the analysis center)

The analysis center calculates \( \boldsymbol{H}\left(\hat{\boldsymbol{\beta}}\right) \) and \( \boldsymbol{B}\left(\hat{\boldsymbol{\beta}}\right) \) using (9) and (11), respectively, and then calculates the estimated variance \( \hat{\mathit{\operatorname{var}}}\left(\hat{\boldsymbol{\beta}}\right) \) using (3).

Due to mathematical equivalence, the above procedure would provide the same point estimates and sandwich variance estimates as the analysis that uses individual-level data pooled across data partners.