Skip to main content

Table 3 Summary of the simulation study or studies conducted in each paper, including the key findings and conclusions of the authors. ICC – Intra-cluster correlation coefficient; LATE – Local average treatment effect; SE – Standard error; SSDF – Small sample degrees of freedom; CI – Confidence interval; 2SLS/TSLS – Two-stage least squares; CL – Cluster level; CP – Coverage probability; ITT – Intention to treat; PP – Per protocol; AT – As treated; IV – Instrumental variable; MSE – Mean squared error; 2SRI – Two-stage residual inclusion; 2SPS – Two-stage predictor substitution; IP-weighted – Inverse probability weighted; ER – Exclusion restriction; OR – Odds ratio; RCT – Randomised controlled trial; RMSE – Root mean squared error; C-Prophet - Compliers proportional hazards effect of treatment; CALM – Causal accelerated life model; OLS – Ordinary least squares; GEE – Generalised estimating equations; HLM – Hierarchical linear model; SD – Standard deviation; LTGM – Latent treat grizzle model; ER – Exclusion restriction; HR – Hazard ratio

From: A systematic review of simulation studies which compare existing statistical methods to account for non-compliance in randomised controlled trials

Paper

Compliance scenarios varied

Other scenarios varied

Performance measures

Key results

Agbla et al. (2020) [23]

o Non-compliance at either cluster or individual level

o Expected probability of compliance differs between these

o Effect of individual and cluster level variables on odds of adherence is varied

o Vary number of clusters, average cluster size, ICC, strength of confounding and true value of LATE

o Effect of individual and cluster level variables on outcome is varied

o Within analysis, consider different methods for weighting, SE estimation and SSDF correction

o Empirical bias + Monte Carlo SE

o Coverage rates of 95% CIs

o Shows that TSLS regression applied to CL summaries is a simple, valid method for obtaining LATE estimates.

o All weighting strategies perform similarly when number of clusters is not small.

o Minimum-variance weights generally perform well unless there are very few clusters or outcome ICC is large.

o Cluster-size weights should not be used when cluster sizes are variable.

o Authors give a useful table of recommendations for different adherence scenarios.

Bang and Davis (2007) [31]

o Non-compliance scenarios considered

- Can occur in either treatment arm

- Only in intervention arm

- Partial compliance

o Within each of these, also varied whether non-compliance was ignorable or symmetric/asymmetric

o Considered two different true treatment effects

o Mean

o Sum of squared errors

o Coverage probability

o IV estimator behaves best and improves upon ITT in terms of bias and CP.

o However, bias of IV not always negligible and IV can be as problematic as PP and AT depending on underlying scenario, except in the hypothetical setting of a constant treatment effect.

o Identify a trade-off between increased information and more reliable statistical properties, since IV requires additional, accurate information and verification of underlying assumptions, which the ITT does not.

Cai et al. (2011) [32]

o Varied probabilities of being an always-taker, complier of never-taker

o Magnitude of confounding is also varied

o Observed bias

o MSE

o Confirm results of previous papers, which show that 2SRI estimator is unbiased when true model is conditional on unmeasured confounder and that the 2SRI bias increases as the magnitude of confounding increases for the treatment effect conditional on compliance.

o Similar results hold for the 2SPS estimator, except that 2SPS is biased even when there is no unmeasured confounding.

o This bias occurs even when all IV assumptions are met.

Cuzick et al. (1997) [21]

o Varied rate of non-compliance and contamination

o Varied benefit of treatment, randomisation ratio and total trial population

o Bias

o Confidence intervals

o “Corrected method” produces larger treatment effects than ITT when baseline failure rates in non-compliers and contaminators are the same as those who accept their allocated treatment and confidence limits are also wider.

o “Corrected method” provides a better estimate of the true treatment effect and more realistic confidence intervals.

Hampson and Metcalfe (2012) [27]

o Proportion of noncompliers varied

o Varied whether effects of compliance on hazard of mortality were strong or weak

o Considered whether compliance indicator and important baseline covariates were correlated or independent

o Also considered models both adjusted and unadjusted for baseline covariates

o Mean

o Percentage bias

o Coverage of 95% CI

o Power

o Methods of estimating causal treatment effects for time-to-event outcomes can be extended to incorporate covariates.

o All three methods are accurate when an important covariate was included in the model, with a maximum bias of 5.4%.

o When there are strong prognostic factors, it is important to adjust efficacy estimates for them in order to avoid biased, whether or not these factors are associated with compliance.

o Generally, it is hard to regain power for testing causal treatment effects, no matter how sophisticated the method of analysis.

Hossain and Karim (2022) [22]

o Varied nonadherence rate

o Considered weak and strong confounding, null and non-null effect and minor or severe violation of the exclusion-restriction assumption.

o Bias

o SE

o MSE

o 95% confidence interval probability

o No single method is the best in all situations.

o Both-stage adjusted 2SLS and 2SRI perform well in terms of bias and coverage when known confounders are adjusted for and this has improved precision over the naïve approach.

o IP-weighted PP outperforms these approaches in terms of bias, SE and MSE for < 80% nonadherence but shows high bias for nonadherence greater than this and does not perform so well when there is unmeasured confounding.

o All methods can have bias when the ER assumption is violated. However, baseline-adjusted PP and IP-weighted PP can be unbiased if all open backdoor paths between the treatment variable and the outcome can be blocked.

Jimenez et al. (2017) [33]

o Level of treatment switching/noncompliance varied

o Crossover considered from both treatment groups

o Compliance considered as both random and based on diagnosis

o Varied risk score effect size, OR for death and OR for high coronary artery disease risk

o Bias

o Root MSE

o CI coverage probability

o Empirical power

o PP analysis can provide biased model estimates when non-compliance is not random.

o ITT analysis generally gives more biased estimates with lower coverage probabilities and lower power in some cases compared to IV as levels of treatment group switching increase.

o IV performed better than ITT in most cases where there was a treatment effect but ITT was slightly better in the null case, although IV was just as good at low levels of switching.

o IV can have higher model estimate variance and greater CI widths as rate of switching increases, which is a trade-off for accurately estimating a true treatment effect whilst preserving a RCTs randomisation.

Korhonen et al. (1999) [28]

o Varied non-compliance rate and whether it is dependent on outcome or not

o Varied treatment effect

o Treatment-free survival considered as both dependent and independent of time on active treatment

o MSE

o Coverage of 95% CI

o Power

o Bias

o ITT analysis often gives estimates that are biased towards the null but is valid for testing purposes, as provided the study has sufficient power the ITT would reject the null hypothesis if a true treatment effect existed.

o AT approach can be misleading when non-compliance is outcome dependent.

o G-estimation provides valid estimates when the underlying structural model is correct, even when non-compliance is outcome dependent. However, it introduces extra censoring and hence, a loss of power is induced.

Merrill and McClure (2015) [34]

o Range of different noncompliance scenarios considered using different distributions (beta and uniform)

o Allowed compliance to be both independent and not independent of other factors

o Range of cutoff points considered since partial compliance was dichotomized

o Considered both a two-arm trial design and a factorial design

o Considered null and true treatment effect

o Average bias

o MSE

o Power

o Use of PP and AT provides little benefit over ITT when compliance is dichotomized, whilst use of IV in this case often led to unacceptably inflated type I error rate.

o This may also be the case for PP and AT, especially if the compliance distribution does not cluster around 0 or 1.

o Results for factorial design similar to two-arm trial. Increased burden for participants mainly affected results through increased levels of overall non-compliance in study population.

Moerbeek and Schie (2018) [24]

o Level of non-compliance varied

o Non-compliance considered at the subject and cluster level

o ICC, cluster size and number of clusters varied

o Each data set analysed with and without a covariate effect

o Mean estimate compared to true effect (bias)

o Standard deviation

o Coverage

o Power

o Partial F statistics for IV method

o Non-compliance may result in severely biased results.

o AT and PP may underestimate population value of target estimand when covariate not included in model, and this becomes more severe as the probability of non-compliance increases.

o Standard errors of AT, PP and IV increase with level of non-compliance.

o In general, results get worse when probability of non-compliance increases and when covariate that influences compliance is not included in statistical model.

o Conclude that avoiding non-compliance is best but where this is not possible, covariates related to compliance should be included in the statistical model.

Odondi and McNamee (2010) [29]

o Non-compliance considered to be both random and non-random

o Correlation between non-compliance and hazard (how much it depends on a patient’s condition) also varied

o Considered two different treatment effects

o Bias

o SE

o RMSE

o 95% CI coverage

o While the time-dependent method is adequate under random compliance it breaks down under non-random compliance with the bias related to the magnitude and direction of correlation between risk and probability of non-compliance.

o All specialist methods performed well in terms of bias, even C-Prophet which took compliance as all-or-nothing but coverage of this method was low.

o CALM performed best in terms of bias and coverage but had largest RMSE.

o G-methods may be more valuable in general as can be extended to explore lagged treatment effects, for example.

Roberts (2021) [35]

o Consider different values of the ratio of variance of compliers in the intervention arm and never-takers in the control arm

o Also vary the difference between control compliers and never-takers and the compliance rate

o Vary ICC and group size

o Bias

o Coverage

o ITT estimates based on random effects model or GEE with exchangeable correlation matrix performed better when using intended group over actual group.

o OLS with robust SEs performed well with both intended and actual group.

o Most CACE models performed well.

o Conclude that it is desirable to record both intended and actual group analyses, as ITT with mixed models can be fitted using intended group with data generation assumptions checked by a causal model using actual group.

o When ITT based on actual group, a worse outcome for never-takers over compliers may allow one to infer that some estimators are biased towards the null treatment effect.

o Generally, the weighting of data by the method of analysis may induce bias where the outcome of subjects in clusters differs from those that are not.

Schweig et al. (2020) [25]

o Level of non-compliance

o Consider 2 conditions with non-compliance just in the intervention group and 1 with it in both intervention and control groups

o Number of clusters and ICC varied

o Also looked at different values for the proportion of ICC that was attributable to provider effects

o Relative bias

o Relative bias in SEs

o Using the AT cluster in HLM will bias the ITT estimate and using as-assigned cluster will bias the standard error estimates when heterogeneity among clusters is due to heterogeneity in the treatment effects.

o Using OLS/linear regression with two-way cluster adjusted SEs can yield unbiased ITT estimates and consistent SEs regardless of the source of random effects and recommends this method to replace HLM in the setting of non-compliance and cluster switching.

Soltanian et al. (2020) [26]

o Three non-compliance rates considered

o Three sample sizes considered

o Mean – average treatment effect

o SD of simulated estimates

o Empirical bias of simulated estimates

o Simulation study showed that the LTGM model has the lowest bias in all cases.

Stuart and Jo (2015) [36]

o Vary the strength of the relationship between a covariate and compliance

o Consider violation of three key method assumptions:

- Exclusion restriction

- Normality

- Principal ignorability

o Bias

o Empirical SE

o RMSE

o Coverage (95%)

o ER based joint approach appears less sensitive to assumptions.

o Performance of both methods is significantly improved when there are strong predictors of compliance.

o Interestingly, both methods perform particularly well when the assumptions of the other are violated, highlighting the importance of carefully selecting an estimation procedure.

Wan et al. (2015) [30]

o Vary the “strength of confounding”/non-compliance

o Probability of being an always taker and complier set to three combinations, representing low, medium and high levels of compliance.

o Vary the hazard rate

o Vary the magnitude of unmeasured confounding

o Probability of being assigned to treatment set to 0.1 or 0.5 to reflect both new and relatively established treatments.

o Bias

o MSE

o 2SRI and 2SPS approaches are both biased in estimating the causal HR among compliers, especially when hazard is increasing, even under a moderate amount of unmeasured confounding.

o 2SPS less biased when hazard is decreasing.

o Even when all assumptions are met, both methods could fail to consistently estimate causal HR.

o Recommend exercising caution when interpreting results from two-stage IV survival models.

o Analytic results for bias may help guide researchers in deciding when two-stage IV methods may be reasonably applied.

Ye et al. (2015) [37]

o Vary the type, randomness and degree of non-compliance

 

o Bias

o MSE

o 95% coverage

o Standard ITT is biased under non-compliance when the intervention has a moderate or large effect, but is the optimal approach when estimating a null effect.

o When patients’ non-compliance behaviour was random, the AT, PP, IV and CACE approaches all provided unbiased estimates. For other scenarios, the optimal method varied.

o The authors provide a useful figure to help researchers choose the best method based on the scenarios considered in this paper.