Notation
A framework for longitudinal causal inference is considered for n individuals \((i=1,...,n)\) in \(m_i\) time-intervals \((j=1,...,m_i)\). An equally spaced time interval is denoted as \(\{T_{i1}< ... < T_{ik_i}\}\). The time-dependent binary treatment is denoted as \(A_i(T_{ij})=A_{ij}\), time-dependent binary confounder is denoted as \(L_i(T_{ij})=L_{ij}\), the occurrence of visit is denoted as \(V_i(T_{ij})=V_{ij}\), the continuous outcome is denoted as \(Y_i(T_{ij})=Y_{ij}\). The vector of baseline covariates for individual i is denoted as \(X_{i}\). The latent confounder \(\eta _i\) is defined as a common cause of treatment \(A_{ij}\) and outcome \(Y_{ij}\) for individual i. In some instances, the index for individual i is suppressed because it is assumed that the random vector for each individual i is drawn independently with respect to other individuals. The cumulative treatment status \(\bar{A}_{ij}\), cumulative confounder status \(\bar{L}_{ij}\), cumulative visit status \(\bar{V}_{ij}\) denotes the complete history of each factor up to and including visit j. As an example, \(\bar{A}_{ij}= \{A_{i0},A_{i1},...A_{ij}\}\); \(\bar{L}_{ij}= \{L_{i0},L_{i1},...L_{ij}\}\); \(\bar{V}_{ij}= \{V_{i0},V_{i1},...V_{ij}\}\). We denote observed history as \(\bar{{H}}_{j-1} \equiv \{\bar{V}_{j-1}, \bar{Y}_{j-1}, \bar{L}_{j-1},\bar{A}_{j-1},X\}\). We introduce \(Y^{\bar{a},\bar{v}}_j\) and \(L_j^{\bar{a},\bar{v}}\) as potential outcome and potential confounder (respectively) under treatment regime \(\bar{a}\) and visit regime \(\bar{v}\) with respect to the \(j^{th}\) visit.
Marginal structural model
The marginal causal effects of treatment are specified through a parametric marginal structural model with respect to the longitudinal outcome as
$$\begin{aligned} g^{-1}(E( Y_{ij}^{\bar{a},\bar{v}} | X_i )) = h(\bar{a},\bar{v}, X_i; \theta )) \end{aligned}$$
(1)
where \(Y_j^{\bar{a},\bar{v}}\) is the potential outcome indexed with respect to hypothetical treatment \(\bar{a}\) and hypothetical visit \(\bar{v}\). The link function is represented using an identity function \(g^{-1}(\cdot )\). We assume a rank-preserving model in which the net change in potential outcome (i.e. \(E( Y_j^{\bar{a},\bar{v}} | \bar{{H}}^*_{j-1})\)) is preserved with respect to the treatment and visit for all subjects (i.e. absence of effect modification) [14]. In the context of EHRs, the causal contrast correspond to net change in HbA1c with respect to glucose-lowering medications. We are interested in accounting for three sources biases that distort the causal estimator: (i) measured confounding arising due to time-dependent covariates, (ii) selection bias arising due to irregular visits (i.e. missing at random), and (iii) subject-specific unmeasured confounder.
We use the generalized estimating equations with inverse probability weights to obtain an estimate of the marginal treatment effect with respect to time-dependent covariates. We describe the marginal structural model using the score function of weighted generalized estimating equations as
$$\begin{aligned} S(\beta )= \sum\limits_{i=1}^n \frac{\partial \mu _i^\prime }{\partial \beta } \nu _i^{-1} SW_i (Y_i - \mu _i(\beta )) = 0 \end{aligned}$$
(2)
where \(SW_i\) are the inverse probability weights with stabilizing factor, \(\nu _i\) is the working covariance matrix of \(Y_i\) and it is specified through working correlation matrix \(R(\alpha )\), and \(\mu _i(\beta )\) is the mean vector. The correct specification of the correlation matrix \(R(\alpha )\) improves the efficiency of estimation while misspecification may still lead to consistent estimators.
The longitudinal weights with stabilization factor are constructed to account for treatment-confounder and visit-confounder feedback as
$$\begin{aligned} SW^{\bar{A}}_t=\prod\limits_{j=1}^{t} \frac{ Pr(A_{j}|\bar{{H}}^*_{j-1} )}{ Pr(A_{j}|\bar{H}_{j-1} )}; \quad SW^{\bar{V}}_t=\prod\limits_{j=1}^{t} \frac{ Pr(V_{j}| \bar{{H}}^{**}_{j-1})}{ Pr(V_{j}| \bar{H}_{j-1})} \end{aligned}$$
$$\begin{aligned} SW^{\bar{A},\bar{V}}_t= SW_t^{\bar{A}} \times SW_t^{\bar{V}}. \end{aligned}$$
where the product terms are defined over cumulative discrete-time intervals. The observed partial history with the exclusion of time-dependent covariates for treatment and visits is denoted as \(\bar{{H}}^*_{j-1}\) and \(\bar{{H}}^{**}_{j-1}\). Since the time-varying treatment \(A_j\) and time-varying visit \(V_j\) are statistically endogenous, the stabilized inverse probability treatment weights \(SW^{\bar{A}}_t\), stabilized inverse probability visit weights \(SW^{\bar{V}}_t\) and joint inverse probability weights \(SW^{\bar{A},\bar{V}}_t\) are required to estimate the marginal causal effect of glucose-lowering medications on hemoglobin A1c. The stabilized treatment weights are used to create a pseudo-population in which the imbalance for time-dependent covariates across treatment groups is reduced. The stabilized visit weights are used to create a pseudo-population in which the selection bias due to irregular visits (missing at random) is reduced. The pseudo-population generated using the joint weights \(SW^{\bar{A},\bar{V}}_t\) incorporate both sources of biases (i.e. confounding bias and selection bias).
Assumptions
Longitudinal causal inference with time-varying treatment use the sequential version of identifiability assumptions: (i) latent sequential exchangeability, (ii) sequential postivity and (iii) sequential consistency. The sequential exchangeability assumption is an extension of conditional exchangeability (or no unmeasured confounding) assumption where the probability of treatment assignment and visit occurrence at \(j^{th}\) visit depends on past treatment, past visit and covariate history, and conditional on these factors the potential outcome and potential confounder is independent of the treatment assignment. The latent sequential exchangeability assumption (or equivalently latent ignorability assumption) can be described as
$$\begin{aligned} \{Y^{\bar{a},\bar{v}}_{j}, L^{\bar{a},\bar{v}}_{j} \}{\perp \! \! \! \perp } \{A_{j}, V_{j} \}| \{ \bar{{H}}_{j-1}, \eta _{i}\}. \end{aligned}$$
The sequential positivity assumption is defined as a non-zero probability of treatment assignment and observed visit at each time interval j given the history \(\bar{{H}}_{j-1}\). We describe sequential positivity as \(P(A_j |\bar{{H}}_{j-1}) >0\) and \(P(V_j| \bar{{H}}_{j-1}) > 0\) for \(\forall \bar{a}, \bar{v}\). The sequential consistency assumption links the potential outcome \(Y_j^{\bar{a},\bar{v}}\) and confounder \(L_j^{\bar{a},\bar{v}}\) to the observed outcome and confounder as
$$\begin{aligned} Y_{j}^{\bar{a},\bar{v}}=Y_{j} \quad \quad L_{j}^{\bar{a},\bar{v}}=L_j \quad \quad \text {if}\ \bar{A}=\bar{a}\ \text {and}\ \bar{V}=\bar{v}. \end{aligned}$$
If causal identifiability assumptions are satisfied then an observational study can be used to mimic a randomized experiment with the limitation that the conditional probability of treatment assignment is unknown and need to be estimated using the data in an observational study. It is further assumed that the administrative censoring \(C_{ij}\) is non-informative where censoring is independent of observation times \(T_{ij}\) and longitudinal outcome \(Y_{ij}\) conditioned on history \(\bar{{H}}_{j-1}\) as \(C_{ij} {\perp \! \! \! \perp } \{Y_{ij},T_{ij}\} | \{ \bar{{H}}_{j-1}, \eta _{i}\}\).
Calibration of inverse probability weights
We motivate the application of calibrated longitudinal weights to estimate the causal effects of glucose lowering medications on hemoglobin A1c. Calibration has been used in causal inference framework to estimate the average treatment effect when regularizing high-dimensional covariates with lasso penalty [15], to minimize the variance of calibrated weights [16], to improve the finite sample properties of maximum likelihood estimation [17], to account for unmeasured cluster-level confounders [18].
Under finite longitudinal sample, stabilized inverse probability weights may not remove the associations between time-dependent treatment \(A_{ij}\) and time-dependent covariate \(\bar{L}_{ij}\) conditional on past treatment regimen \(\bar{A}_{ij-1}\). There may still exist chance imbalances and residual confounding of covariates in the pseudo-population (weighted using \(SW_{ij}^A\) or \(SW_{ij}^V\)) and this may contribute to finite sample estimation errors [19]. The sample estimation errors may further become exacerbated when the treatment model or the visit model are misspecified. In this article, the longitudinal inverse probability treatment and visit weights are calibrated to account for (i) associations between treatment regimen and time-dependent covariates, (ii) associations between irregular visits and time-dependent covariates, (iii) unmeasured subject-specific (i.e. time-invariant) confounder. The purpose of calibrating the longitudinal weights is to improve the small sample properties of the longitudinal weights [17], while accounting for the unmeasured subject-specific confounder and irregular visits.
The calibration restrictions are employed on inverse probability weights to make the treatment assignment unassociated with the history of the time-varying covariates at each time interval in the pseudo-population. Yiu and Su [17] proposed the calibrated inverse probability weights to improve estimation errors under finite samples using maximum likelihood estimation. Similar to Yiu and Su [17], we calibrate the treatment and visit inverse probability weights using the maximization of weighted log-likelihood functions:
$$\begin{aligned} \prod\limits_{i=1}^n \prod\limits_{j=1}^{m_i} \left\{ \prod\limits_{k=1}^j P_{\alpha _w}(A_{ik} | \bar{H}_{ik-1}; \alpha _w)\right\} ^{SW_{ij}^{A}(\lambda )} \end{aligned}$$
$$\begin{aligned} \prod\limits_{i=1}^n \prod\limits_{j=1}^{m_i} \left\{ \prod\limits_{k=1}^j P_{\omega _w}(V_{ik} | \bar{H}_{ik-1}; \omega _w)\right\} ^{SW_{ij}^{V}(\lambda )} \end{aligned}$$
after weighting the sample as
$$\begin{aligned} SW_{ij}^A({\lambda } )= SW_{ij}^A c(\bar{L}_{ij},{\lambda } ) \end{aligned}$$
$$\begin{aligned} SW_{ij}^V({\lambda } )=SW_{ij}^V c(\bar{L}_{ij},{\lambda } ) \end{aligned}$$
where \(c(\bar{L}_{ij},{\lambda } )\) is the calibration function and reduces to one when the vector of coefficients \({\lambda }=0\) (i.e. \(c(\bar{L}_{ij},{\lambda =0})=1)\). We assume a non-negative parametric form of \(c(\bar{L}_{ij},{\lambda } )= exp(K\lambda )\), where the matrix \(K \in \mathbb {R}^{N \times r}\) includes the data-dependent restrictions with \(N=\sum _{i=1}^n m_i\) observations and r columns. The unknown vector of \(\lambda \in \mathbb {R}^{r}\) is estimated using the calibration of inverse probability treatment and visit weights.
In similar spirit to Yiu and Su [20], the regression coefficient \(\alpha _w\) of treatment model are partitioned into \(\{\alpha _b,\alpha _d\}\), where \(\alpha _b\) characterize the baseline distribution (e.g. intercept term) and \(\alpha _d\) characterize the dependence of treatment assignment and covariates. Likewise, the regression coefficients of irregular visits \(\omega _w\) are partitioned into \(\{\omega _b, \omega _d\}\) with the same convention. We maximize the weighted log-likelihood function with respect to \(\lambda\) through a score equation
$$\begin{aligned} \sum\limits_{i=1}^n \sum\limits_{j=1}^{m_i} SW_{ij}^A ({\lambda }) \sum\limits_{ k=1}^j \left. \frac{\partial }{\partial \alpha _w} log(P_{\alpha _w}(A_{ik} |\bar{H}_{ik-1}; \alpha _\omega )) \right| _{\alpha _b=\hat{\alpha }, \alpha _{d}=0} = 0 \end{aligned}$$
(3)
where the innermost index k accumulates over j time intervals. The dependence of treatment \(A_{ij}\) with the time-dependent covariates \(\bar{L}_{ij}\) is represented as \(\alpha _d\) and it is constrained to be zero while the vector of treatment coefficients (excluding time-dependent covariates) is represented as \(\alpha _b\) and it is set to the maximum likelihood estimates \(\hat{\alpha }\). In addition to calibrated treatment weights, the visit weights are calibrated to reduce the association between irregular visits and time-dependent covariates through a score equation
$$\begin{aligned} \sum\limits_{i=1}^n \sum\limits_{j=1}^{m_i} SW_{ij}^V ({\lambda }) \sum\limits_{ k=1}^j \left. \frac{\partial }{\partial \omega _w} log(P_{\omega _w}(V_{ik} | \bar{H}_{ik-1};\omega _w)) \right| _{\omega _b=\hat{\omega }, \omega _{d}=0} = 0 \end{aligned}$$
(4)
where the vector of regression coefficients \(\omega _{d}=0\) denote the independence of time-dependent covariate \(\bar{L}_{i,j-1}\) and irregular visits \(V_{ij}\) (constrained to be zero) while \(\omega _b=\hat{\omega }\) is set to maximum likelihood estimates \(\hat{\omega }\).
In both score Eqs. (3) and (4), we notice that the calibrated weights (i.e. \(SW_{ij}^A(\lambda )\) and \(SW_{ij}^V(\lambda )\)) are used to weight the likelihood of treatment model and visit model (respectively) for the \(i^{th}\) patient up to and including the time interval j. The calibration restrictions are inverted to estimate the values of unknown coefficients \(\lambda\). The calibration restrictions using \(\{\alpha _d=0\}\) and \(\{\omega _d=0\}\) ensures that the treatment assignment and irregular visits are statistically exogenous with respect to the time-dependent covariates. Since the covariate balancing restrictions reduce the dependence for treatment assignment and irregular visits with respect to the functional covariate history \(\tilde{L}_{ij}\), we may represent the model-based restrictions (derived in Appendix Section) as
$$\begin{aligned} \sum\limits_{i=1}^n \sum\limits_{j=1}^{m_i} SW_{ij}^A ({\lambda }) \sum\limits_{ k=1}^j (A_{ik} - \hat{e}_{ik}^A) \tilde{L}_{ik-1} = 0 \end{aligned}$$
and
$$\begin{aligned} \sum\limits_{i=1}^n \sum\limits_{j=1}^{m_i} SW_{ij}^V ({\lambda }) \sum\limits_{ k=1}^j (V_{ik} - \hat{e}_{ik}^V) \tilde{L}_{ik-1} = 0 \end{aligned}$$
where the model-based propensity scores for treatment model \(\hat{e}_{ik}^A\) and visit model \(\hat{e}_{ik}^V\) are estimated as \(\hat{e}_{ij}^A=P_{\alpha }(A_{ij}| \bar{H}_{j-1}^*)\) and \(\hat{e}_{ij}^V=P_{\omega }(V_{ij}| \bar{H}_{j-1}^*)\). The model-based covariate balancing restrictions are accumulated with respect to longitudinal observations. The residuals for treatment (i.e. \(A_{ij} - \hat{e}_{ij}^A\)) and irregular visit (i.e. \(V_{ij} - \hat{e}_{ij}^V\)) are set to be orthogonal with respect to the functional history of time-dependent covariates \(\tilde{L}_{ij}\) (including intercept term) in the pseudo-population defined using the calibrated weights.
Unity mean restrictions
The stabilized inverse probability weights used in the pseudo-likelihood function of marginal structural models tend to satisfy the property of unity mean at each time interval (i.e. \(E(SW_{j}^A)=E(SW_{j}^V)=1 \ \forall j\)) [21]. However, this property is not guaranteed to hold for calibrated inverse probability weights. Thus, in addition to the covariate balancing score constraints, we further impose the restrictions for average calibrated weights to be one at each time interval as
$$\begin{aligned} E( SW_{ij}^{A}(\varvec{\lambda })) = \frac{1}{n} \sum\limits_{i=1}^n SW_{ij}^{A}(\varvec{\lambda })=1 \quad \quad E( SW_{ij}^{V}(\varvec{\lambda })) = \frac{1}{n} \sum\limits_{i=1}^n SW_{ij}^{V}(\varvec{\lambda })=1. \end{aligned}$$
The average treatment weights and visit weights are constrained to be equal to one at each time interval to stabilize the longitudinal weights and to prevent trivial solutions of zero (or negative weights) during the calibration procedure.
Time-invariant latent restrictions
In the presence of subject-level unmeasured confounder \(\eta _i\), the exposed and the unexposed subjects (with respect to treatment) are not conditionally exchangeable given \(\bar{H}_{j-1}\) because non-causal association between \(\bar{A}_{j}\) and \(Y_{j}\) cannot be blocked by conditioning on measured history \(\bar{H}_{j-1}\). We derive the balancing constraints for subject-specific unmeasured confounder in the context of repeated-measures longitudinal outcomes in Appendix Section. We obtain the empirical constraints to account for the unmeasured individual-level confounder \(\eta _i\) using treatment \(A_{ij}\) as
$$\begin{aligned} \left\{ \begin{array}{ll} \sum _{i=1}^n \sum _{j=1}^{m_i} SW_{ij}^A ({\lambda }) \sum _{ k=1}^j \left[ (A_{ik} - \hat{e}^A_{ik}) \times L_{ik-1} \right] &{}= 0\\ \sum _{j=1}^{m_i} SW_{ij}^A ({\lambda }) \sum _{ k=1}^j \left[ (A_{ik} - \hat{e}^A_{ik}) \right] &{}= 0 \quad \quad \forall i \end{array}\right. \end{aligned}$$
(5)
and using visit \(V_{ij}\) as
$$\begin{aligned} \left\{ \begin{array}{ll} \sum _{i=1}^n \sum _{j=1}^{m_i} SW_{ij}^V ({\lambda }) \sum _{ k=1}^j\left[ (V_{ik} - \hat{e}^V_{ik}) \times L_{ik-1} \right] &{}= 0\\ \sum _{j=1}^{m_i} SW_{ij}^V ({\lambda }) \sum _{ k=1}^j \left[ (V_{ik} - \hat{e}^V_{ik}) \right] &{}= 0 \quad \quad \forall i \end{array}\right. \end{aligned}$$
(6)
The empirical constraints in Eqs. (5) and (6) are sufficient to describe the covariate balancing restrictions with respect to the time-invariant latent confounder \(\eta _i\). These empirical restrictions balance the time-dependent covariate distribution across treatment groups and visit indication within each time interval in the presence of subject-specific latent confounder.