Skip to main content

Inverse probability weighting to handle attrition in cohort studies: some guidance and a call for caution



Attrition in cohort studies challenges causal inference. Although inverse probability weighting (IPW) has been proposed to handle attrition in association analyses, its relevance has been little studied in this context. We aimed to investigate its ability to correct for selection bias in exposure-outcome estimation by addressing an important methodological issue: the specification of the response model.


A simulation study compared the IPW method with complete-case analysis (CCA) for nine response-mechanism scenarios (3 missing at random – MAR and 6 missing not at random - MNAR). Eighteen response models differing by the type of variables included were assessed.


The IPW method was equivalent to CCA in terms of bias and consistently less efficient in all scenarios, regardless of the response model tested. The most effective response model included only the confounding factors of the association model.


Our study questions the ability of the IPW method to correct for selection bias in situations of attrition leading to missing outcomes. If the method is to be used, we encourage including only the confounding variables of the association of interest in the response model.

Peer Review reports

Cohort studies are essential for investigating associations between exposure and health outcomes thanks to their prospective design. The repeated collection of information in successive follow-ups (also called survey waves) allows studying the effects of past exposures on health outcomes occurring at inclusion or thereafter. However, such studies are known to be affected by partial and total non-response, which can invalidate the causal inference that can be drawn from them. Partial non-response refers to missing data that occasionally occurs for certain variables during a survey wave when some individuals fail or refuse to answer some of the questions. Total non-response (or attrition) occurs when a subset of individuals does not participate in one specific survey wave or quit the study completely [1]. Only the latter (drop-outs) was considered in this study.

Missing data resulting from non-response can be classified according to their postulated underlying mechanism [2]. In situations of the missing completely at random mechanism (MCAR), the probability of missing data does not depend on either the observed or unobserved values. In situations of the missing at random mechanism (MAR), it depends on the observed data but not the unobserved data. Finally, in situations of the missing not at random mechanism (MNAR), it depends on the unobserved data.

The simplest and most widely used approach to handle total non-response in cohort studies is complete-case analysis (CCA). This method assumes a MCAR mechanism and consists of studying the exposure-outcome association in the subset of respondents only. However, total non-response is generally considered to result from MAR or MNAR mechanisms. Several methodological publications have suggested the use of the inverse probability weighting (IPW) method in situations of the MAR mechanism of attrition [3, 4]. It aims to recreate a representative sample of the initial cohort by differentially weighting the so-called “complete individuals” (i.e. those who participate in the survey wave under consideration). More precisely, when modeling the association between exposure and outcome, respondents are weighted by the inverse of their probability to participate (hereinafter referred to as the “response probability” or “probability of response”). This response probability depends on some of the respondent’s characteristics. The use of the inverse of this probability implies that a respondent with a high probability of response (e.g. an individual with a high socio-economic level [5]) is given a comparatively lower weight in the analysis. The approach can be summarized as: “the respondents carry the weight of the non-respondents”.

The probability of response is unknown and needs to be estimated from the data. The first step is therefore to build a response model (logistic regression model) to obtain weights that will be used in a second step in the association model. Because the association model is only fitted among respondents or complete individuals, the method is also called “weighted complete-case analysis” [2]. It is also referred to as “inverse probability of participation/attrition weighting” (IPPW/IPAW) in the literature [6,7,8].

Originally developed for reducing the effects of confounding in observational studies (propensity score method) [9], the IPW method was extended to correct for selection biases in situations of attrition. Although researchers have already adopted the method in association studies [10, 11], guidance on its correct use is still needed, in particular regarding the specification of the response model (i.e. variables to be introduced into the response model).

In this article, we will focus on attrition resulting in a missing outcome of interest. This situation is commonly encountered in mother-child cohorts for example, where the effects of prenatal medical conditions or exposures on the future health of the children are studied. At the time point of interest (6 year old, for example), some children do not participate in the follow-up. Depending on the attrition mechanism (MAR, MNAR), restricting the analysis to the participating children may result in a biased estimation of the association between the exposure and the outcome.

Our work aimed at evaluating through simulations the ability of the IPW method to correct for a selection bias under various missingness mechanisms and specifications of the response model. Response model specifications were compared in terms of bias, variance and mean square error of the association estimates between the exposure and the outcome.

In all scenarios tested, we assumed that the exposure variable and the other covariates were fully observed at preceding waves or at baseline.

Which variables should be introduced into the response model?

Relatively few authors have addressed the question of which variables should be introduced in the response model from which the weighting is derived. In 2004, Hernan et al. recommended including the exposure variable and all variables that independently predict both response and outcome [3]. In 2013, Seaman and White advised not including variables that are exclusively related to the response without being related to the outcome and exposure variables. They suggested adding confounding variables (i.e. associated with both exposure and outcome) and prognostic variables (i.e. exclusively associated with the outcome) of the association studied [4]. Seaman and White’s recommendations are consistent with simulation studies performed in the propensity score literature [12, 13]. In this context, including variables that are related to the exposure but not the outcome is discouraged. Variables unrelated to the exposure but related to the outcome should instead be included. The literature on the IPEW method considers only two variables: the exposure and the outcome, whereas the IPPW method involves the response as a third variable. Thus, there is still uncertainty as to whether the inclusion of confounding and prognostic variables in the response model depends on whether or not they are associated with the response. Furthermore, if the exposure variable X is itself associated with the response, should it be included in the response model?

A quick glance at the applied literature shows that researchers usually build the response and association models independently. They often fit a logistic regression model by including the presumed predictors of response, whether or not they are related to exposure or outcome [14, 15]. Their approach is thus primarily to fit a model that perfectly predicts the response and not to optimize the response model in relation to the association model.

None of the proposed strategies in the literature has been tested through simulations and they do not appear to be applied by researchers. Therefore, we propose here to provide insight on this issue by studying the impact of the type of variables included in the response model on the bias and variance of the exposure regression coefficient in the association model.

Simulation study

We conducted a Monte-Carlo simulation study under several MAR and MNAR scenarios. We aimed to evaluate i) the relative performance of the IPPW method relative to CCA and ii) how the specification of the response model in the IPPW method affects the bias of the exposure regression coefficient \(\hat{\beta}\), its variance and mean square error, and the coverage rate of confidence intervals.

The SAS code to implement the simulation study is available in Additional file 1: e-Appendix 1.

We focused on the case of a linear regression model in which a continuous outcome is explained by continuous exposure and covariates.

Data-generating process

We first created a sample of size n = 1,000, containing seven covariates z1, …, z7 generated independently according to standard normal distributions. We then generated an exposure variable according to the following model:

$${x}_i=1+{\alpha}_1\ {z}_{1i}+{\alpha}_2\ {z}_{2i}+{\alpha}_5\ {z}_{5i}+{\alpha}_6{z}_{6i}+{\epsilon}_i,$$

where ϵi is generated according to a standard normal distribution. In the exposure model (1), the coefficients were chosen as α1 = α2 = α5 = α6 = 0.218, so that the correlation between xi and each of the covariates zi was approximately 0.2. We generated an outcome variable according to the following model:

$${y}_i=1+\beta {x}_i+{\beta}_1\ {z}_{1i}+{\beta}_3\ {z}_{3i}+{\beta}_5\ {z}_{5i}+{\beta}_7\ {z}_{7i}+{\epsilon}_i^{\prime },$$

where ϵi is generated according to a standard normal distribution. In the outcome model (2), the coefficients were chosen as β =0.5, β1 = β5 = 0.170, and β3 = β7 = 0.230, such that the correlation between yi and xi was approximately 0.3,and the correlation between yi and any of the covariates zi was approximately 0.2. Finally, we generated response probabilities according to the following logistic model:

$$\mathrm{logit}\left({p}_i\right)={\gamma}_0+{\gamma}_y\ {y}_i+{\gamma}_x\ {x}_i+{\gamma}_1\ {z}_{1i}+{\gamma}_2\ {z}_{2i}+{\gamma}_3\ {z}_{3i}+{\gamma}_4\ {z}_{4i},$$

We used the values γy = 0.0, 0.2 or 0.5 and γx = 0.0, 0.2 or 0.5. The case in which γy =0.0 corresponds to a MAR situation (i.e. the response probability does not depend on yi). The cases in which γy =0.2 and 0.5 correspond to MNAR situations (i.e. the response probability depends on yi). In the response model (3), the coefficients γ1, γ2, γ3, and γ4 were chosen to be equal to 0.1. The coefficient γ0 was chosen such that the average response rate was approximately 60% for all cases. In the sample, the individuals responded independently with the probabilities pi. The data-generation model is presented in Fig. 1 and the nine response mechanism scenarios are summarized in Table 1.

Fig. 1
figure 1

Scheme of the data-generation model for the simulation experiments. Seven covariates, differing in their association with the variables of interest (exposure and outcome variables) and the response variable, were generated. The strength of the associations (dashed arrows) between the variables of interest and the response variable (γx and γy) varied according to the scenarios described in Table 1

Table 1 Response mechanism scenarios (data generation)

Simulation parameters and performance criteria

We compared the IPPW method to CCA for a parsimonious association model, i.e. including only the confounding variables Z1 and Z5, which corresponds to standard epidemiological practice: \({y}_i=1+{x}_i+{\mathrm{z}}_{1\mathrm{i}}+{z}_{5i}+{\epsilon}_i^{\prime }\).

Several response models were tested (see Table 2) to determine the impact of the type of variables included on the \(\hat{\beta}\) regression coefficient of the exposure variable and its variance in the association model. Briefly, we first evaluated the “well-specified” response model, i.e. the one that included all the variables really related to the response (X, Z1, Z2, Z3, Z4), as simulated (Eq. 1). We also initially included the exposure variable X, although this variable was not associated with the response in certain tested scenarios (MAR 1, MNAR 1, MNAR 4). We then tested a model including all available variables. Then, we assessed the proposals by Hernan et al. (2004) and by Seaman and White (2013), described above [3, 4]. Finally, we evaluated parsimonious strategies: including only the confounding variable associated with the response, including only the confounding variable not associated with the response, including both, including both with the addition of a prognostic variable not associated with the response, and finally, including both confounding and prognostic variables not associated with the response. All these response models were then evaluated without the exposure variable X.

Table 2 Response models tested

The generation of the sample and variables was repeated B = 10,000 times. For each sample, we computed the \(\hat{\beta}\) regression coefficient and its variance according to the 18 possible response models. The simulations were conducted using SAS version 9.4.

The results were assessed according to the following criteria:

  • The Monte Carlo bias:

    $${B}_{MC}\left({\hat{\beta}}_x\right)=\frac{1}{\mathrm{10,000}}\sum_{b=1}^{\mathrm{10,000}}\left({\hat{\beta}}_x^b-\beta \right)$$
  • The Monte Carlo variance:

    $${V}_{MC}\left({\hat{\beta}}_x\right)=\frac{1}{\mathrm{10,000}-1}\sum_{b=1}^{\mathrm{10,000}}{\left({\hat{\beta}}_x^b-\overline{\hat{\beta_x}}\right)}^2\ with\ \overline{\hat{\beta_x}}=\frac{1}{\mathrm{10,000}}\sum_{b=1}^{\mathrm{10,000}}{\hat{\beta}}_x^b.$$
  • The mean square error:

    $${MSE}_{MC}\left({\hat{\beta}}_x\right)=\frac{1}{\mathrm{10,000}-1}\sum_{b=1}^{\mathrm{10,000}}{\left({\hat{\beta}}_x^b-\beta \right)}^2.$$
  • The relative root mean square error:

    $${\boldsymbol{RRMSE}}_{\boldsymbol{MC}}\left({\hat{\boldsymbol{\beta}}}_{\boldsymbol{x}}\right)=\mathbf{100}\times \frac{\sqrt{{\boldsymbol{MSE}}_{\boldsymbol{MC}}\left({\hat{\boldsymbol{\beta}}}_{\boldsymbol{x}}\right)}}{\boldsymbol{\beta}}.$$

The Monte Carlo variance is the variance of the estimates over 10,000 replications. Therefore, it accounts for the entire variability of the estimators, including the fact that the weights are estimated. We have also computed the coverage rates for the normality-based confidence intervals for \({\hat{\boldsymbol{\beta}}}_{\boldsymbol{x}}\), with nominal rates of 2.5% in each tail.

Results of the simulation study

The simulation results are reported in Tables 3, 4 and 5 for the Monte Carlo bias, variance, mean square error and relative root mean square error, and in Table 6 for the coverage rates.

Table 3 Simulation study results: bias, variance, mean square error and related root mean square error in the \(\hat{\beta}\) regression coefficient for CCA and the IPPW method (18 response models), for three MAR response mechanism scenarios
Table 4 Simulation study results: bias, variance, mean square error and related root mean square error in the \(\hat{\beta}\) regression coefficient for CCA and the IPPW method (18 response models), for three MNAR response mechanism scenarios
Table 5 Simulation study results: bias, variance, mean square error and related root mean square error in the \(\hat{\beta}\) regression coefficient for CCA and the IPPW method (18 response models), for three MNAR response mechanism scenarios
Table 6 Simulation study results: coverage rate of the normality-based confidence interval for the \(\hat{\beta}\) regression coefficient for CCA and the IPPW method (18 response models), for nine response mechanism scenarios

Bias in the \(\hat{\boldsymbol{\beta}}\) regression coefficient

We observed no bias with either CCA or the IPPW method for the three MAR scenarios and MNAR scenario 1 (γx = 0.0, γy = 0.2). A bias occurred with both methods for the five other MNAR scenarios, with a greater amplitude for MNAR scenarios 5 (γx = 0.2, γy = 0.5) and 6 (γx = 0.5, γy = 0.5). The bias was globally equivalent between CCA and the IPPW method for these five scenarios. Within the IPPW method, all response models tested showed the same bias pattern across all MAR and MNAR scenarios. For MNAR scenarios 1 to 3, the absolute bias increases as γx increases. Similarly, the absolute bias increases as γx increases for MNAR scenarios 4 to 6.

Variance of the \(\hat{\boldsymbol{\beta}}\) regression coefficient

The IPPW method was less efficient than CCA for all scenarios. We observed an increase in variance with increasing correlation between the exposure variable X and the response (illustrated by Figs. 2 and 3). The loss of efficiency of the IPPW method was thus particularly pronounced in MAR scenario 3 and MNAR scenarios 3 and 6 (all three characterized by γx = 0.5).

Fig. 2
figure 2

Monte-Carlo variance obtained with CCA and the IPPW method (response models 1 and 10, see Table 2) for the nine response mechanism scenarios. The variance increased as the correlation between the exposure variable and the response variable increased for both methods. The variance was consistently higher with the IPPW method than with CCA in all scenarios. With the IPPW method, variance inflation was particularly observed when the exposure variable X was put into the response model

Fig. 3
figure 3

Monte-Carlo variance obtained with CCA and the IPPW method (response models 6 and 15, see Table 2) for the nine response mechanism scenarios. The variance increased as the correlation between the exposure variable and the response variable increased for both methods. The variance was consistently higher with the IPPW method than with CCA in all scenarios. With the IPPW method, variance inflation was particularly observed when the exposure variable X was put in the response model. On the other hand, removal of variable X (covariate Z5 only) resulted in the variance obtained with the IPPW method being very close to that obtained by CCA

The response models that enabled a reduction in the variance were those in which the exposure variable X was removed (see Figs. 2 and 3). This was particularly observed in scenarios in which the exposure variable X was associated with the response, but it was also observed in MAR scenario 1 and MNAR scenarios 1 and 4 (all three characterized by γx=0.0). Among the response models without the exposure variable X, the response model that further reduced the variance was that which included only the variable Z5 (confounding variable not associated with the response). Nevertheless, response models including the variables Z5, Z7 (a confounding variable and a prognostic variable, neither associated with the response), and only Z1 (confounding variable associated with the response) also showed good performance in terms of precision. Overall, the gain in precision obtained with all these response models did not enable us to reach the level of precision obtained using CCA.

Mean square error of the \(\hat{\boldsymbol{\beta}}\) regression coefficient

For the MAR scenarios, all the tested estimators are unbiased and there is therefore no difference between the variance and the mean square error (see Table 3). For MNAR scenarios 1 to 3, the mean square error increases with γx, i.e. when the correlation between the exposure variable and the response increases. This also holds true for MNAR scenarios 4 to 6.

Coverage rates

The coverage rates are well respected for all the MAR scenarios and for MNAR scenario 1. For MNAR scenarios 1 to 3, the coverage decreases as the correlation between the exposure variable and the response increases. This also holds true for MNAR scenarios 4 to 6. The coverage rates are poorly respected for MNAR scenarios 5 and 6.

Illustrative example

As an example, we analyzed the association between pre-pregnancy maternal BMI with the child’s BMI at age 7 in TIMOUN, a prospective mother-child cohort study conducted in the Guadeloupe archipelago (French West Indies) [16].

Study population and data collection

Between November 2004 and December 2007, 1068 pregnant women were enrolled in TIMOUN by obstetricians during their second- or third-trimester prenatal visit at public hospitals or at a local dispensary. At inclusion, women were interviewed by trained midwives to assess their medical history, socioeconomic conditions, and lifestyle. At birth, information concerning maternal diseases during pregnancy, health status of the newborn, and details of the delivery was also collected [17]. In total, 1033 single live births were registered. Several follow-ups were organized within a selected subsample of the children at 3, 7, and 18 months of age [18, 19]. When the children were 7 years of age, all the mothers initially included were invited to participate in a new follow-up which consisted of an interview of the mothers and a medical examination of the children. Among the 1033 mother-child couples initially included, 592 participated in this second wave, representing 57% of the initial sample. Weight was not measured for two children examined at age 7, resulting in a final population of 590 for the association studied (see detailed flow-chart in Additional file 1: e-Appendix 2).

Outcome and exposure

The exposure of interest was the pre-pregnancy maternal BMI (kg/m2). It was calculated from the mothers’ self-reported weight and height before pregnancy at inclusion in the cohort. The outcome of interest was the child’s BMI at 7 years. It was calculated from measurements performed during a medical examination at 7 years.


The covariates considered in the analysis were maternal age at birth (continuous), maternal educational level (< 5 years, 5–12 years, > 12 years), maternal place of birth (French West Indies, other Caribbean island, Europe), non-gestational maternal diabetes (yes, no), enrollment site (university hospital, local hospital, antenatal care dispensary), maternal alcohol consumption during pregnancy (yes, no), maternal smoking during pregnancy (yes, no), sex of the child (boy, girl).

The proportion of missing data within these covariates did not exceed 3%, except for maternal alcohol consumption during pregnancy (5.6%). For the variables with missing values, a single imputation by the modal value was previously performed.

The directed acyclic graph (DAG) on which we based our analyses is presented in Fig. 4. All arrows were placed according to a priori knowledge. In our study, the DAG approach did not identify all the types of covariates Z presented in the simulation study: no variables of type Z3, Z5, or Z6 were present in our example. In this didactic example, we assume a situation equivalent to the MAR 1 scenario in the simulation study (i.e. the response at 7 years depends neither on the exposure nor the outcome, but only on the covariates).

Fig. 4
figure 4

Directed acyclic graph (DAG) of the known or assumed associations between variables of the illustrative example. For the sake of simplicity and clarity, the arrows representing the associations between the covariates are not drawn. Not all types of variables considered in the simulation study were suitable for this illustrated example. The covariates ‘maternal educational level’ and ‘maternal place of birth’ were considered to be confounding factors in the relationship between pre-pregnancy maternal BMI and child BMI at age 7. The association models were therefore adjusted for these covariates

A linear regression model was fitted with an a priori adjustment for maternal education and maternal place of birth (confounding variables). Both CCA and the IPPW method were applied, the latter using several response models.

The analyses were performed using R 3.3.2 (R Foundation for Statistical Computing, Vienna, Austria). The standard errors were computed taking into account the weight estimation phase, according to the method described by Metten et al. [20].

The R code to implement the CCA and IPPW analyses and a training dataset are available in Additional file 1: e-Appendix 3.


The β coefficients related to the exposure of interest were very similar between CCA and the IPPW method (Table 7). Within the IPPW results, the most effective response model strategy was the one including only Z1 variables (maternal educational level and maternal place of birth).

Table 7 Adjusted association between pre-pregnancy maternal BMI and child BMI at age 7 (CCA and IPPW method)


Attrition is a major methodological issue in cohort studies. It challenges the validity of association analyses because its occurrence is generally not completely at random. Several authors have proposed the IPPW method to correct for potential selection biases [3, 4]. However, little evaluation of the method has been performed and there is little guidance for researchers who wish to apply it, in particular for the specification of the response model.

Our simulation study showed no superiority of the IPPW method over CCA in terms of bias, and it even led to a loss of efficiency. Both were similarly unbiased in the MAR scenarios and similarly biased in most MNAR scenarios. For the MNAR scenarios, the absolute bias increased as the correlation between the exposure and the response increased. As a result, the mean square error is high for these scenarios when γx = 0.5. In addition, because the bias is negative, the confidence intervals are shifted to the left and the nominal error rates are poorly respected.

These results are consistent with those observed in a study comparing several methods of handling attrition in a simulated cohort of 300 subjects [21]. In this study, the authors concluded that CCA produces results as valid as those obtained with the other compared methods, which included the IPPW method. It is worth noting that the IPPW method consists in reweighting the study population with complete data, meaning that both CCA and IPPW methods are based on the same sub-population. Therefore, a difference in efficiency cannot be attributed to a varying sample size. One explanation for the loss of efficiency observed with the IPPW method lies in the fact that adding covariates in the response model tends to increase the variability of estimated weights.

We chose to solely compare the IPPW method to CCA. However, there are also other approaches, including imputation methods, which consist of replacing missing values with plausible ones. Multiple imputation (MI) is an advanced imputation method that has steadily improved and gained popularity in recent years [22, 23]. It consists of imputing the dataset several times by using adapted models that include the collected variables. However, imputation methods are mainly used for missing covariates in situations of partial non-response. Seaman and White emphasized that it may be potentially dangerous to use MI in situations of total non-response [4]. The risk of mis-specifying the imputation model would be high because it requires the imputation of all missing variables of a given survey wave, without auxiliary information at the time of the survey wave. The results of Lewin et al. also showed that MI was no better than CCA in situations of attrition that lead to a missing outcome [24]. This is also consistent with the findings of Kristman et al. [21].

Our study aimed also to assess the impact of the choice of the variables included in the response model on the bias of the exposure regression coefficient and its variance. The various response models tested did not change the bias patterns, which is consistent with what has been observed in the literature on the propensity score method. Indeed, Brookhart et al. showed that the issue of the choice of variables resided essentially in the variance, not in the bias [12]. The strategy of not including variables associated with the exposure in the propensity score, but rather confounding and prognostic variables, improved the precision of the estimates without increasing the bias.

In our study, we show that it is preferable not to include the exposure variable in the response model. Otherwise variance inflation would be observed, which is not in line with the proposal of Hernan et al. [3]. Paradoxically, this phenomenon was particularly pronounced in scenarios in which the exposure variable was associated with the response. This can possibly be explained by over-fitting because the exposure variable is present in the association model.

Within response models without the exposure variable, the minimalist strategy, consisting of including only the confounding variable unrelated to the response, resulted in the lowest estimated variance. Close response models (in order of best precision: inclusion of both confounding and prognostic variables unrelated to the response; inclusion of the confounding variable related to the response) also performed well in terms of precision. Thus, parsimonious strategies using the same variables as the association model (except the exposure variable) were the most effective. This was also observed in our illustrated example. The strategy to optimize the response model when using the IPW method to limit a selection bias (IPPW) is thus the same as that recommended in situations in which the IPW method is used to limit a confounding bias (propensity scores).

The strategy for constructing the response model requires clear identification of the role played by the variables. This can be based on a structural approach using DAGs, as we did in our example in Section 5 [3]. DAGs are causal analysis tools originally designed to assist in the selection of variables in an association model [25]. They make it possible to control for a confusion bias and avoid over-adjustment in the association model. Within the framework of the propensity score method, Austin and Stuart recommended using them to identify sets of variables to be included in the propensity score [26]. Similarly, it can be useful to guide the variable selection in the response model. Indeed, it allows researchers to better visualize the relationships between all the variables involved in the association of interest (exposure, outcome, response, covariates) and thus enables optimization of the specification of the response and association models.

DAGs are based on a priori knowledge and thus do not protect against misidentification of the role played by the variables. In surveys, rather than weighting individuals by their individual probability of response, the sample is often partitioned into response homogeneity groups (RHGs), i.e. groups that are homogeneous in terms of response probability. The parameters of interest are estimated in each group and then pooled across the groups to obtain an overall parameter. This strategy, while improving the precision of the estimates, protects against possible misspecification of the response model. The RHG method is quite similar to what is referred to as stratification in the context of the propensity score used to reduce a confounding bias. In the context of attrition, Seaman and White proposed a stratified IPPW method, but the stratification was not based on response probability but rather response patterns to survey waves [4]. Once the strata were defined, a response model was fitted independently in each stratum. We are not aware of any use of stratification on the probability of response in association studies based on cohort data, but this may be a new application of an existing method in other contexts (surveys, and propensity scores).


The first strength of our study is that we tested through simulations nine response mechanism scenarios, corresponding to three degrees of correlation between the response variable and our interest variables (exposure, outcome). The parameters chosen were consistent with those observed in the TIMOUN cohort to represent a realistic setting. In addition, we evaluated the impact of several response models on the estimated exposure effect. This has not been previously performed in the literature when the IPW method has been used to reduce a selection bias.


This study also had limitations. First, our simulation framework did not consider binary outcomes, although this is a common situation in epidemiology. However, recent literature indicates that CCA is potentially much less prone to give biased estimates of the exposure coefficient in a logistic regression [27], making the linear regression framework more challenging for evaluating the IPPW method. Second, in our simulations the level of attrition was kept constant, at a quite high but realistic level (40%) for cohort studies. The influence of the attrition rate is however well known, with the expected conclusion that bias and variance increase with the percentage of non-respondents [21]. Moreover, the attrition level we chose was realistic for cohort studies and thus quite high (40%). Third, we did not vary the degree of correlation between the covariates Z and the response or our variables of interest (exposure, outcome). However, Lewin et al. showed that a strong correlation between the outcome and a variable of type Z3 (i.e. associated with the outcome and response) could increase the magnitude of the bias [24]. Finally, our study only considered attrition leading to missing outcomes and fully observed exposure and covariates. However, these variables may also be affected by partial non-response in everyday practice. Seaman proposed a mixed approach to address this problem, combining MI and the IPPW method [28]. Although it has already been used by several epidemiological researchers [29,30,31], such an approach should be further explored, especially for the evaluation of its superiority over a method combining IPPW and single imputation.

Finally, we did not address the consequence of using estimated weights (inverse response probability) in the association models. The usual statistical packages and procedures estimate the standard error of the effect of exposure as if weights are a priori known, ignoring the extra-variability due to their estimation. Consequently, the standard error is biased and may mislead the conclusions about the significance of the effect. We proposed an exact estimation of the variance (linearized variance) that should be used when IPW is implemented (as we did in our illustrative example). The details of the calculations of this variance are available in Metten et al. [20].


Our study suggests that using IPPW to handle attrition in cohort studies does not reduce bias and may result in a loss of efficiency. These results therefore raise questions about the contribution of the IPW method to correcting possible selection bias that occurs in situations of attrition that lead to a missing outcome in association analyses. If the method is to be used, we encourage use of only the confounding variables of the association of interest in the response model.

Availability of data and materials

The SAS code corresponding to the simulation study, the R code to implement an IPPW analysis, and a training dataset are provided in the Additional file 1 of the article.

The dataset from the TIMOUN cohort used as an illustrative example in this article cannot be made openly available due to ethical concerns. The TIMOUN team can provide the data on request, after appropriate approvals. Requests should be submitted to Dr. Luc Multigner.


  1. Spiers S, Oral E, Fontham ETH, Peters ES, Mohler JL, Bensen JT, et al. Modelling attrition and nonparticipation in a longitudinal study of prostate cancer. BMC Med Res Methodol. 2018;18(1):60.

    Article  Google Scholar 

  2. Little R, Rubin D. Statistical analysis with missing data. New York: Wiley; 2002.

    Book  Google Scholar 

  3. Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15(5):615–25.

    Article  Google Scholar 

  4. Seaman SR, White IR. Review of inverse probability weighting for dealing with missing data. Stat Methods Med Res. 2013;22(3):278–95.

    Article  Google Scholar 

  5. Hense S, Pohlabeln H, Michels N, Mårild S, Lissner L, Kovacs E, et al. Determinants of attrition to follow-up in a multicentre cohort study in children-results from the IDEFICS study. Epidemiol Res Int. 2013;2013:1–9.

    Article  Google Scholar 

  6. Weuve J, Tchetgen Tchetgen EJ, Glymour MM, Beck TL, Aggarwal NT, Wilson RS, et al. Accounting for bias due to selective attrition: the example of smoking and cognitive decline. Epidemiology. 2012;23(1):119–28.

    Article  Google Scholar 

  7. Rabideau DJ, Nierenberg AA, Sylvia LG, Friedman ES, Bowden CL, Thase ME, et al. A novel application of the Intent to Attend assessment to reduce bias due to missing data in a randomized controlled clinical trial. Clin Trials J Soc Clin Trials. 2014;11(4):494–502.

    Article  Google Scholar 

  8. Biele G, Gustavson K, Czajkowski NO, Nilsen RM, Reichborn-Kjennerud T, Magnus PM, et al. Bias from self selection and loss to follow-up in prospective cohort studies. Eur J Epidemiol. 2019;34(10):927–38.

    Article  CAS  Google Scholar 

  9. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res. 2011;46(3):399–424.

    Article  Google Scholar 

  10. Gignac F, Romaguera D, Fernández-Barrés S, Phillipat C, Garcia Esteban R, López-Vicente M, et al. Maternal nut intake in pregnancy and child neuropsychological development up to 8 years old: a population-based cohort study in Spain. Eur J Epidemiol. 2019;34(7):661–73.

    Article  CAS  Google Scholar 

  11. Pryor L, Strandberg-Larsen K, Nybo Andersen A-M, Hulvej Rod N, Melchior M. Trajectories of family poverty and children’s mental health: results from the Danish National Birth Cohort. Soc Sci Med. 2019;220:371–8.

    Article  Google Scholar 

  12. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol. 2006;163(12):1149–56.

    Article  Google Scholar 

  13. Austin PC, Grootendorst P, Anderson GM. A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study. Stat Med. 2006;26:734–53.

    Article  Google Scholar 

  14. Edwards AC, Joinson C, Dick DM, Kendler KS, Macleod J, Munafò M, et al. The association between depressive symptoms from early to late adolescence and later use and harmful use of alcohol. Eur Child Adolesc Psychiatry. 2014;23(12):1219–30.

    Article  Google Scholar 

  15. Widen EM, Whyatt RM, Hoepner LA, Mueller NT, Ramirez-Carvey J, Oberfield SE, et al. Gestational weight gain and obesity, adiposity and body size in African-American and Dominican children in the Bronx and Northern Manhattan: gestational weight gain and child adiposity. Matern Child Nutr. 2016;12(4):918–28.

    Article  Google Scholar 

  16. Kadhel P, Monfort C, Costet N, Rouget F, Thomé J-P, Multigner L, et al. Chlordecone exposure, length of gestation, and risk of preterm birth. Am J Epidemiol. 2014;179(5):536–44.

    Article  Google Scholar 

  17. Hervé D, Costet N, Kadhel P, Rouget F, Monfort C, Thomé J-P, et al. Prenatal exposure to chlordecone, gestational weight gain, and birth weight in a Guadeloupean birth cohort. Environ Res. 2016;151:436–44.

    Article  Google Scholar 

  18. Boucher O, Simard M-N, Muckle G, Rouget F, Kadhel P, Bataille H, et al. Exposure to an organochlorine pesticide (chlordecone) and development of 18-month-old infants. Neurotoxicology. 2013;35:162–8.

    Article  CAS  Google Scholar 

  19. Dallaire R, Muckle G, Rouget F, Kadhel P, Bataille H, Guldner L, et al. Cognitive, visual, and motor development of 7-month-old Guadeloupean infants exposed to chlordecone. Environ Res. 2012;118:79–85.

    Article  CAS  Google Scholar 

  20. Metten M-A, Costet N, Viel J-F, Chauvet G. Reflection on modern methods: a note on variance estimation when using inverse probability weighting to handle attrition in cohort studies. Prepr Serv HAL. 2021; Available from:

  21. Kristman VL, Manno M, Côté P. Methods to account for attrition in longitudinal data: do they work? A simulation study. Eur J Epidemiol. 2005;20(8):657–62.

    Article  Google Scholar 

  22. Rubin D. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.

    Book  Google Scholar 

  23. White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30(4):377–99.

    Article  Google Scholar 

  24. Lewin A, Brondeel R, Benmarhnia T, Thomas F, Chaix B. Attrition bias related to missing outcome data: a longitudinal simulation study. Epidemiology. 2018;29(1):87–95.

    Article  Google Scholar 

  25. Greenland S, Pearl J, Robins J. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37–48.

    Article  CAS  Google Scholar 

  26. Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015;34(28):3661–79.

    Article  Google Scholar 

  27. Hughes RA, Heron J, Sterne JAC, Tilling K. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. Int J Epidemiol. 2019;48(4):1294–304.

    Article  Google Scholar 

  28. Seaman SR, White IR, Copas AJ, Li L. Combining multiple imputation and inverse-probability weighting. Biometrics. 2012;68(1):129–37.

    Article  Google Scholar 

  29. Afshar M, Wu D, Arvizu RD, Aguilar FG, Kalhan MDR, Davis SM, et al. Association of serum lipids and obstructive lung disease in Hispanic/ Latino adults of diverse backgrounds. J Pulm Respir Med. 2017;07(04):419.

    Article  Google Scholar 

  30. Green MJ, Stritzel H, Smith C, Popham F, Crosnoe R. Timing of poverty in childhood and adolescent health: evidence from the US and UK. Soc Sci Med. 2018;197:136–43.

    Article  Google Scholar 

  31. Mahedy L, Field M, Gage S, Hammerton G, Heron J, Hickman M, et al. Alcohol use in adolescence and later working memory: findings from a large population-based birth cohort. Alcohol Alcohol. 2018;53(3):251–8.

    Article  Google Scholar 

Download references


The authors would like to thank Mélanie Prague and Etienne Dantan for fruitful discussions on methodology.


This work was supported by grants from the French Ministry of Health, General Health Directorate (N° DGS RMC11129NNA & R17142NN), and the Fondation de France (N° 69263).

Author information

Authors and Affiliations



L.M. contributed to the acquisition of data from the TIMOUN cohort. Analyses were performed by M.A.M. and G.C. The first draft of the manuscript was written by M.A.M. and co-authors commented on previous versions of the manuscript and amended it. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Guillaume Chauvet.

Ethics declarations

Ethics approval and consent to participate

The TIMOUN cohort was approved by the Guadeloupean Ethics Committee for studies involving human subjects, and detailed informed consent was obtained from each woman (at inclusion) and from parents of each participating child (at the 7 year-follow-up).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: e-Appendix 1.

SAS code to implement the simulation study. e-Appendix 2. Detailed flow-chart of the TIMOUN cohort. e-Appendix 3. R Script to implement an IPPW analysis - Illustrative example (demo dataset “dt.csv”).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Metten, MA., Costet, N., Multigner, L. et al. Inverse probability weighting to handle attrition in cohort studies: some guidance and a call for caution. BMC Med Res Methodol 22, 45 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: