In this study we tested the validity and precision of analyzing treatment effectiveness using IV analysis with treatment assignment per hospital as instrument compared to traditional adjustment methods, to avoid confounding by indication. Our simulation study suggests that in the presence of unobserved confounders and treatment variation between hospitals, IV analysis provides more valid treatment effect estimates compared to the regular covariate adjustment methods which correct for established prognostic factors. However, the IV analysis was considerably less reliable compared to regular adjustment. Furthermore, adjusted IV analysis where we adjusted for a common cause, gave valid results also when there was correlation between treatment preference and hospital characteristics, while regular IV analysis resulted in skewed estimates.
Both IV analyses estimate the effect of the treatment preference on a hospital level, and as a result, instead of the effect of the treatment we estimate the effect of the treatment preference. We note that this is a different measure and will therefore not be completely interchangeable with the effect of treatment per person. Furthermore, while over 15,000 patients were included in the analysis, and each simulated hospital was set to have 100 patients, this results in 150 hospitals being simulated. When comparing the different analyses this leads to a 100 fold difference in number of cases: n = 150 for IV analyses compared to n = 15,000 in the patient level analyses.
Unmeasured confounders and treatment preference
We simulated 6 scenarios which build up in complexity to what we believe to be the most realistic situation. In scenario 1, an under the null scenario, treatment is attributed randomly and there is no treatment effect. Scenario 2, is what we would encounter in a randomized control trial: treatment is still attributed randomly but there is an effect of treatment. In scenario 3, we simulated the ideal observational study, where outcome and treatment are dependent on observed confounders. In scenario 4, the treatment preference is introduced, but an IV analysis is shown to be useless in this scenario, where adjustment for confounding gives the most viable results. It is only in scenario 5 when unmeasured confounders are cooperated into the analysis, that bias is introduced by using the regular adjustment methods. On the other hand the scenarios where there is no treatment preference (scenario 1,2,3 and 5) show clearly what the effect of a very weak instrument can be: in this case extremely uncertain and biased estimates from both IV analyses (d and e) which solely rely on measuring the effect of treatment preference. The estimates are, in these cases, as biased as analysis with the univariate model. The strength of the instrument can be tested and should always be considered before doing the IV analysis.
However, when unobserved confounders are included in the simulated scenarios (scenario 4 and 6), the regular adjustment methods (b and c) that only take into account measured confounders, do not sufficiently adjust for all confounders. Therefore, regular adjustment methods result in invalid treatment effect estimates in scenarios with unmeasured confounders. These scenarios, we believe are closer to the situations we face in reality, especially in the absence of evidence-based treatment recommendations. Scenario 6 includes a treatment preference per hospital as well as unmeasured confounders. It is impossible to know the degree of bias unmeasured confounders will introduce. Strong observed confounding is an indication of systematic differences between the treatment groups and thus an indication that also unobserved confounders may exist. Insight in the treatment allocation mechanisms from expert knowledge (i.e. how do doctors decide to treat or not) will provide additional information on whether unobserved confounders are expected. In our study the confounders used as unmeasured confounders just serve as an illustration to show the bias, but not to quantify the amount of bias.
In scenario 2 we see more conservative estimates for the univariate, propensity score and IV analysis. The difference between the point estimates in a scenario comparable to RCT can be ascribed to differences in the specific statistical models underlying the analyses. The regression analysis with adjustment estimates the treatment effect on patient level. This is a conditional effect estimate. The other methods estimate treatment effects on average (or the effect of treatment preference); these are marginal treatment effects, which are closer to the Null value [11, 29].
Precision
The estimated SEs show lower precision for the IV analyses compared to regular adjustment methods in general. Point estimates also vary far more compared to regular adjustment methods in the unadjusted IV analysis (method d). The adjustment in the IV analysis seems so to solve this. IV analysis will however still require a far larger study population than in the patient level approaches to compensate for lesser statistical precision [30, 31].
Conditions for IV analysis
Although IV analysis can result in valid treatment effect estimates, it is dependent on certain assumptions and conditions which have to be met [11]; (relevance) that there is an instrument and that this instrument is associated with the exposure, (exclusion restriction) the IV can only affect the outcome through treatment, and (exchangeability) the outcome and the IV cannot share a common cause. Violation of these assumptions of IV analysis can lead to different kinds of bias [32].
The need of a relevant IV is illustrated clearly in scenario 4 and 6 in Fig. 3. If hospitals do not base their choice to give treatment on a treatment preference, the IV analysis will estimate only noise. The extremely large SEs of the IV analysis in scenario 4 and 6, illustrate the extreme case in which the instrument has no effect at all on the outcome. The R2 of 0.38 shows that the treatment preference in the simulation affects the treatment, and is therefore relevant.
Further, the exclusion restriction cannot be tested, and remains an assumption based on clinical knowledge and literature [33]. When looking at the placement of intracranial pressure monitors it is possible that treatment preference could also lead to different clinically relevant choices/medication being given to the patient, which could be the true cause of the better or worse results. It is however assumed not to be the case that placing an intracranial pressure monitor will lead to different treatment choices further down the road other than those which are unavoidable after the procedure.
As for exchangeability, IV estimates were shown to not be biased by case-mix differences between hospitals (Table 4, Fig. 2). In our case where we suspect association between hospital performance and treatment preference, the exchangeability assumption is not met.
We assume that it is realistic to think there would be some (but little) correlation between treatment policies. In our case it would be imaginable that one treatment policy correlates with another treatment policy, or with certain facilities a hospital might have. The results of the IV analysis without adjustment (d) are therefore in line with what is to be expected, and show an overestimation of the treatment (preference) effect. Correcting for hospital seems to be a possible solution for analyzing data with unmeasured confounders as well as a common cause for the IV and the outcome. We see that this no longer leads to the overestimation of the treatment effect in analysis e.
In scenario 6 where a “true” adjustment was done (IV analysis but also adjustment for measured and unmeasured confounders) resulted in a point estimates exactly as the simulated treatment effect. This shows us that in the case of our simulation there was no difference in measuring the treatment vs. measuring treatment preference. It also shows that any noise measured, causing the estimate to not be exactly 0.5, is probably due to the case mix of the hospitals.
Adjusting for hospital
In case we suspect correlation between general hospital characteristics and treatment preference we have tested the effect of adjusting for hospital, we have done this using a fixed effect model. A random effects model was not used because it would assume the covariates in the model to be independent of the exposure, since we explicitly assume they are not independent of each other, this assumption would be invalid. In these cases fixed effects models are generally advised instead of random effects models [34,35,36].
Strengths and limitations
A limitation of this study is that it is based on a very specific real-life situation, we did not test a multitude of situation. The results of this study are not generalizable to other studies unless there are similar conditions. The strength of this study lies in the fact that the chosen parameters rely on real observed data. However, in the simulation study we measure an R2 of 0.38 for the predictability of treatment by treatment preference, while in the actual data we see a R2 of 0.21. Which shows that in this case our IV would not be as strong of an estimator of the real treatment since instruments which are too weak can lead to inconsistencies and bias [37, 38].