Assessing the properties of patient-specific treatment effect estimates from causal forest algorithms under essential heterogeneity

Brooks, John M.; Chapman, Cole G.; Chen, Brian K.; Floyd, Sarah B.; Hikmet, Neset

doi:10.1186/s12874-024-02187-5

Research
Open access
Published: 13 March 2024

Assessing the properties of patient-specific treatment effect estimates from causal forest algorithms under essential heterogeneity

John M. Brooks^1,2,
Cole G. Chapman^3,4,
Brian K. Chen^2,4,
Sarah B. Floyd^4,5 &
…
Neset Hikmet^4,6

BMC Medical Research Methodology volume 24, Article number: 66 (2024) Cite this article

714 Accesses
1 Altmetric
Metrics details

Abstract

Background

Treatment variation from observational data has been used to estimate patient-specific treatment effects. Causal Forest Algorithms (CFAs) developed for this task have unknown properties when treatment effect heterogeneity from unmeasured patient factors influences treatment choice – essential heterogeneity.

Methods

We simulated eleven populations with identical treatment effect distributions based on patient factors. The populations varied in the extent that treatment effect heterogeneity influenced treatment choice. We used the generalized random forest application (CFA-GRF) to estimate patient-specific treatment effects for each population. Average differences between true and estimated effects for patient subsets were evaluated.

Results

CFA-GRF performed well across the population when treatment effect heterogeneity did not influence treatment choice. Under essential heterogeneity, however, CFA-GRF yielded treatment effect estimates that reflected true treatment effects only for treated patients and were on average greater than true treatment effects for untreated patients.

Conclusions

Patient-specific estimates produced by CFAs are sensitive to why patients in real-world practice make different treatment choices. Researchers using CFAs should develop conceptual frameworks of treatment choice prior to estimation to guide estimate interpretation ex post.

Peer Review reports

Introduction

Developing patient-specific treatment effect evidence to guide individualized treatment decision-making is a cornerstone of patient-centered care [1,2,3]. The need for patient-specific evidence follows from the acknowledged breadth of outcome variation across patients receiving the same treatment. [4,5,6,7,8,9,10]. This phenomenon is known as treatment effect heterogeneity and is defined as “nonrandom variation in the direction of magnitude of a treatment effect” [11]. With their restrictive inclusion/exclusion criteria, randomized controlled trials cannot generate appropriate patient-specific evidence for many patients [4, 11,12,13,14]. As an alternative, observational data provide treatment variation within the context of real-world practice and a diversity of patients well beyond those evaluated in RCTs [2, 3, 12, 15, 16]. The traditional approach to estimate patient-specific treatment effects using observational data is to use parametric estimators and assign to each patient an estimated treatment effect from a “reference class” of patients [17,18,19,20,21,22]. Reference classes are defined a priori by the researcher based on combinations of measured patient factors that are conceptually associated with treatment effect heterogeneity [17,18,19,20,21,22]. The need to specify reference classes a priori has been described as “the central problem when using group evidence to forecast outcomes (or treatment effects) in individuals” [18]. Even with a small number of measured patient factors, a patient could be placed in many reference classes, leaving it unclear which class is best aligned to the patient [10, 17, 18].

Causal forest algorithms (CFAs) have been proposed to estimate patient-specific treatment effects in a manner that essentially assigns patients to reference classes ex post using information from the data, thereby eliminating the need to assign patients to reference classes a priori [23,24,25,26,27,28,29,30,31,32,33]. Simulation modeling has shown that CFAs can accurately estimate patient-specific treatment effects in scenarios in which treatment effect heterogeneity does not influence treatment choice [24, 26,27,28,29, 34,35,36,37]. However, in many real-world scenarios it is conceivable that unmeasured patient factors associated with treatment effectiveness influence treatment choice. This is called essential heterogeneity or sorting on the gain in the econometrics literature [38,39,40,41,42,43,44,45,46,47,48,49,50,51]. The properties of parametric treatment effect estimators under essential heterogeneity are well known [38,39,40,41,42,43,44,45,46,47,48,49,50,51]. However, the impact of essential heterogeneity on patient-specific treatment effect estimates using CFAs has not been evaluated. In this paper, we contrast the properties of patient-specific treatment effect estimates using the causal forest algorithm within the generalized random forests application (CFA-GRF) across simulation scenarios that vary in the extent that unmeasured patient factors associated with treatment effectiveness influence treatment choice.

Methodological background

Assigning patients into appropriate reference classes using observational data either a priori with parametric estimators or ex post through a CFA does not ensure that the resulting treatment effect estimates are appropriate for each patient. The conventional criticism of using observational data to estimate treatment effects is the risk of omitted variable bias in which unmeasured factors with direct effects on study outcomes are distributed differently between treated and untreated patients [52]. However, even if patients were assigned to appropriate reference classes and omitted variable bias risk is mitigated through study design, a single treatment effect estimate for a reference class may not be appropriate for each patient within a class. The econometric literature has shown that parametric estimators yield average treatment effect estimates for patient subsets based on treatment choice [38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67]. Under the assumption of no omitted variable bias, regression-based estimators yield unbiased estimates of the average treatment effect for the subset patients who chose treatment or the average treatment effect on the treated (ATT) [43, 48,49,50, 54, 57, 60, 68, 69]. Consequently, if treatment choice in an empirical setting was influenced by unmeasured patient factors related to treatment effectiveness – essential heterogeneity – the parametric estimate of ATT for a reference class will overstate the true treatment effects for the untreated patients in the class [39, 49, 50, 70]. Researchers using parametric estimators have learned not to generalize a single parametric treatment effect estimate to all patients in a population [38, 43, 47,48,49,50,51, 53, 55, 56, 58, 59, 61, 67, 70, 71].

In contrast, the properties of estimated patient-specific treatment effects from CFAs under essential heterogeneity have not been explored. Simulation research has demonstrated that CFAs accurately yield patient-specific treatment effects under the broad condition of ignorability [24, 26,27,28,29, 34,35,36]. Ignorability assumes that omitted variable bias does not exist within an empirical setting. However, ignorability also assumes that essential heterogeneity does not exist. These dual assumptions can be described using potential outcome notation. Define Y_1i and Y_0i as the potential outcomes for patient “i” when treated and untreated, respectively, and (Y_1i – Y_0i) is the true potential treatment effect for patient “i”. Define T_i as the observed treatment choice for patient “i” and X_i as the set of measured patient factors available to the researcher. Ignorability is broadly defined as (Y_1i, Y_0i) $\perp$ T_i | X_i or conditional on X_i, treatment choice is independent of both potential patient outcomes [72]. As such, ignorability implies the following two distinct assumptions.

$$\left({{\text{Y}}}_{0{\text{i}}}\right) \perp {{\text{T}}}_{{\text{i}}} \left|{{\text{X}}}_{{\text{i}}}\right.$$

(1.1)

Assumption (I.1) says that, within a reference class of patients based on X_i, treatment choice is unrelated to untreated potential outcomes across patients. Or stated differently, treatment choice is unrelated to unmeasured patient factors associated with Y_0i. Assuming (I.1) eliminates the risk of omitted variable bias in an observational study [52].

Even if assumption (I.1) is true though, treatment effects may remain heterogeneous within a reference class defined by X_i. With respect to this heterogeneity, ignorability further assumes:

$$\left({{\text{Y}}}_{1{\text{i}}} - {{\text{Y}}}_{01}\right) \perp {{\text{T}}}_{{\text{i}}} \left|{{\text{X}}}_{{\text{i}}}\right.$$

(1.2)

Assumption (I.2) says that, within a reference class of patients defined by X_i, treatment choice within the class is not influenced by unmeasured patient factors associated with treatment effectiveness or there is no essential heterogeneity [38, 39, 45]. If ignorability holds within a reference class defined by X_i, only the treatment variation that stems from patient factors unrelated to treatment effectiveness will be used to estimate treatment effects within the class. Consequently, CFA simulation results which assume ignorability provide no guidance on the properties of patient-specific treatment effect estimates in real-world scenarios in which essential heterogeneity is thought to exist a priori. For example, the effectiveness of surgery for patients with shoulder fractures is thought to vary with fracture complexity and patient resiliency, which in turn influence surgery choice [73,74,75,76,77], but fracture complexity and patient resiliency are not measurable in large observational databases such as Medicare claims data [73,74,75,76,77]. A study using a causal forest algorithm to estimate patient-specific surgery effects using Medicare claims data theorized a priori that the resulting estimates should be interpreted in terms of essential heterogeneity, but evidence was not available to guide these interpretations [78]. In addition, understanding influence of essential heterogeneity on CFA estimates is especially relevant to researchers proposing to use CFAs in effectiveness-implementation hybrid study designs in which the promotion of a treatment is randomized to satisfy assumption (I.1) but decision makers still have the discretion to choose among available treatments based on individual patient factors [79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95].

To provide this guidance, this study modified a treatment choice-based simulation method used in previous research to assess the impact of essential heterogeneity on patient-specific treatment effect estimates from a CFA estimator [43, 48, 53]. Eleven patient populations were simulated with the same distribution of true treatment effects drawn from identical distributions of simulated patient factors. All eleven simulations were specified to satisfy assumption (I.1). The simulations varied by plausible differences in the extent to which knowledge of true patient-specific treatment effects influenced treatment choice. We used the causal forest algorithm within the generalized random forests application (CFA-GRF) [24,25,26, 96, 97] to estimate patient-specific treatment effects for each simulated population. CFA-GRF has been singled out as the most appropriate CFA for estimating patient-specific treatment effects [98]. To tease out the influence of essential heterogeneity, we applied CFA-GRF to each simulated population under conditions of (1) fully observed heterogeneity in which all patient factors associated with treatment effect heterogeneity are observed by the researcher and (2) partially observed heterogeneity in which only a subset of the patient factors associated with treatment effect heterogeneity are observed by the researcher. Patient-specific treatment effect estimates from CFA-GRF were used to calculate the average absolute and average percentage differences between true and estimated effects for each simulated population and for treatment choice-based population subsets.

Methods

Simulation model

Our simulation model follows the general framework in the essential heterogeneity literature [39, 43, 45, 48, 53, 99]. Figure 1 contains a directed acyclic graph (DAG) illustrating the conceptual framework of treatment effect heterogeneity, treatment choice, and outcome within our simulations. Figure 1 was adapted from standard DAG approaches to reflect patient factors affecting treatment effectiveness and the treatment effect knowledge of the decision maker [100, 101]. Outcome (Y_i) equals 1 if patient “i” is cured of the medical condition, and 0 if not cured. P(Y_i|T_i,S_i) is the probability of cure for patient “i” conditional on treatment choice (T_i) and patient severity (S_i). Patient cure probability also varies with accumulated other factors (W_i). Treatment (T_i) equals 1 if the patient receives treatment and 0 otherwise, which we designate as watchful waiting. In all simulations, the true absolute treatment effect for each patient “i” (TE_i) on Y_i relative to watchful waiting varies with six factors X_1i, X_2i, X_3i, X_4i, X_5i, and X_6i based on the following equation:

$${{\text{TE}}}_{{\text{i}}}\left({{\text{X}}}_{1{\text{i}},}{{\text{X}}}_{2{\text{i}}}{{\text{X}}}_{3{\text{i}}}{{\text{X}}}_{4{\text{i}}}{{\text{X}}}_{5{\text{i}}}{{\text{X}}}_{6{\text{i}}}\right) = {{\upbeta }_{1}}{*}{{\text{X}}}_{1{\text{i}}} + {{\upbeta }_{2}}{*}{{\text{X}}}_{2{\text{i}}} + {{\upbeta }_{3}}{*}{{\text{X}}}_{3{\text{i}}} + {{\upbeta }_{4}}{*}{{\text{X}}}_{4{\text{i}}} + {{\upbeta }_{5}}{*}{{\text{X}}}_{5{\text{i}}} + {{\upbeta }_{6}}{*}{{\text{X}}}_{6{\text{i}}}$$

(1)

X_1i, X_2i, X_3i, X_4i, X_5i, and X_6i are binary variables distributed Bernoulli for each patient with a probability of 0.5. Each β_x equals the absolute change in treatment effect if a patient has condition “X” (β₁ = 0.024, β₂ = 0.048, β₃ = 0.071, β₄ = 0.095, β₅ = 0.119, β₆ = 0.143). With these parameter values, simulated patients have true treatment effects ranging from 0 to 0.5 with an average true treatment effect of 0.25 for each simulated population. For example, if the simulated patient factors for patient “i” (X_1i,X_2i,X_3i,X_4i,X_5i,X_6i) were (1,0,1,0,1,0), then patient “i’s” true TE_i was.214 = (0.024 + 0 + 0.071 + 0 + 0.095 + 0). Figure 2 illustrates the identical distribution of simulated treatment effects across all eleven simulations in this study.

The true cure probability relationship for each simulated patient “i” signified by the red arrows in Fig. 1 is as follows:

$$\mathrm{Probability}\;\mathrm{of}\;{\mathrm{Y}}_\mathrm{i}\mathrm{=}\mathrm P\left({\mathrm{Y}}_\mathrm{i}\left|{\mathrm{T}}_\mathrm{i},{\mathrm{S}}_\mathrm{i}\right.\right)+{\mathrm{W}}_\mathrm{i}=\left({\mathrm\alpha}_\mathrm{O}+{\mathrm\alpha}_\mathrm{S}\cdot{\mathrm{S}}_\mathrm{i}+{\mathrm{TE}}_\mathrm{i}\left({\mathrm{X}}_{1\mathrm{i}},{\mathrm{X}}_{2\mathrm{i}},{\mathrm{X}}_{3\mathrm{i}},{\mathrm{X}}_{4\mathrm{i}},{\mathrm{X}}_{5\mathrm{i}},{\mathrm{X}}_{6\mathrm{i}}\right)\cdot{\mathrm{T}}_\mathrm{i}\right)+{\mathrm{W}}_\mathrm{i}$$

(2)

α₀ equals the untreated patient cure probability at the mean severity level and was set to 0.1 in all simulations. Patient severity (S_i) was specified as a uniformly distributed random variable from -0.5 to 0.5. α_S equals the change in untreated patient cure probability for differences in severity level and was set to -0.1 in all simulations. As a result, in each simulated population, watchful waiting patients (T_i = 0) had a cure probability ranging from 0.05 to 0.15. Treated patients (T_i = 1) had a cure probability ranging from 0.05 to 0.65. All other unmeasured patient factors impacting the probability of a cure are found in (W_i).

The green arrows in Fig. 1 describe the treatment choice process that varied across the eleven simulations. In each simulation, it is assumed that the treatment decision-maker observes X_1i, X_2i, X_3i, X_4i, X_5i, and X_6i and forms an expected treatment effect for patient “i”. The simulations differ by the knowledge available to decision makers of the relationship between the six patient factors and treatment effectiveness, as represented by the expected treatment effect function for simulation “j”:

$${\text{ETE}}_\text{ij}\left({\text{X}}_{1\text{i}},{\text{X}}_{2\text{i}},{\text{X}}_{3\text{i}},{\text{X}}_{4\text{i}},{\text{X}}_{5\text{i}},{\text{X}}_{6\text{i}}{\text{K}}_\text{j}\right)={\text{K}}_\text{j}\ast\left({\text{TE}}_\text{i}\left({\text{X}}_{1\text{i}},{\text{X}}_{2\text{i}},{\text{X}}_{3\text{i}},{\text{X}}_{4\text{i}},{\text{X}}_{5\text{i}},{\text{X}}_{6\text{i}}\right)-.25\right)+.25.$$

(3)

${{\text{K}}}_{{\text{j}}} \in \left(\text{0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1}\right)$ is the proportion of patient-specific TE_i knowledge used by decision makers in simulation “j” that is distinct from the average population treatment effect. Decision makers are more aware of each patient’s true treatment effect relative to the average population treatment effect as K_j increases from 0 to 1 across simulations. For example, in the simulation in which K_j = 0, decision makers only have knowledge of the average treatment effect across the population (0.25) when making treatment decisions for each patient. Alternatively, when K_j = 1, decision makers have exact knowledge of the treatment effect for patient “i” from observed X_1i, X_2i, X_3i, X_4i, X_5i, and X_6i. ETE_ij(X_1i, X_2i, X_3i, X_4i, X_5i,X_6i,K_j) is used to calculate the expected value of treatment for patient “i” based on the following:

$${\mathrm{EVT}}_{\mathrm i}\left({\mathrm{ETE}}_{\mathrm{ij}},\mathrm V,\mathrm C,{\mathrm U}_{\mathrm i}\right)=\mathrm V\cdot{\mathrm{ETE}}_{\mathrm{ij}}\left({\mathrm X}_{1\mathrm i},\;{\mathrm X}_{2\mathrm i},\;{\mathrm X}_{3\mathrm i},\;{\mathrm X}_{4\mathrm i},\;{\mathrm X}_{5\mathrm i},\;{\mathrm X}_{6\mathrm i},\;{\mathrm K}_{\mathrm j}\right)-\mathrm C+{\mathrm U}_{\mathrm i}$$

(4)

EVT_i(ETE_ij,V,C,U_i) sums the expected benefits and detriments (e.g., costs) of treatment relative to watchful waiting for patient “i” that is conditional on knowledge K_i, X_1i,X_2i,X_3i,X_4i,X_5i,X_6i, direct treatment cost C, cure value V, and U_i other accumulated factors affecting treatment value, which are independent of treatment effectiveness for patient “i”. ETE_ij(X_1i, X_2i, X_3i, X_4i, X_5i,X_6i,K_j) equals the decision maker’s expected change in cure probability from treatment. To focus this study on the impact of essential heterogeneity across simulations, all patients were assigned a cure value V of $800 and a treatment cost C of $200. These values were chosen because they yield simulated population treatment percentages of approximately 50%. V designations of $500 and $1100 were also tried, which yielded different population treatment percentages but did not influence the interpretation of our results relative to the essential heterogeneity. U_i is the source of treatment valuation that varies across patients, is unrelated to treatment effectiveness and is unmeasured by the researcher. U_i values were assigned to patients from a normal distribution with a mean of zero and a common variance ${\sigma }_{U}^{2}$ across simulations. Furthermore, in all simulations, U_i was specified independently of W_i so that the differences in unmeasured factors influencing treatment choice had no relationship with the unmeasured factors directly effecting cure so that ignorability assumption (I.1) was satisfied.

In all simulations, decision makers chose treatment for patient “i” if EVT_i was positive and watchful waiting if EVT_i was negative. In the simulation in which the knowledge of patient-specific treatment effect heterogeneity is zero (K_j = 0), only variation in U_i leads to different treatment choices across simulated patients. As K_j increases across simulations, a larger proportion of the variation in treatment choice variation is attributable to treatment effectiveness or sorting on the gain. Once a treatment was chosen for each patient, cure (Y_i) was simulated using a Bernoulli function of P(Y_i|T,S_i) for patient “i”, given T_i and S_i. Table 1 summarizes the model parameters and values used in the simulations.

Table 1 Summary of simulation model parameters

Full size table

To support large sample properties, we generated 50,000 patients in each simulation. The blue arrows in Fig. 1 describe the variables observed by the researcher after each simulation. By varying the knowledge of TE_i across simulations with K_j and the patient factors observed by the researcher, we can tease out the impacts of essential heterogeneity on patient-specific treatment effect estimates. In each scenario, researchers observe T_i, Y_i, S_i. We designate “fully observed heterogeneity” as the empirical condition in which researchers observe all six patient factors X_1i, X_2i, X_3i, X_4i, X_5i, and X_6i. We designate “partially observed heterogeneity” as the empirical condition in which researchers observe only X_1i, X_2i, X_3i, and X_4i. Under fully observed heterogeneity, treatment effects are homogeneous within each reference class spanned by combinations of the complete set of patient factors. When K_j = 0, decision-makers are not knowledgeable of the sources of treatment effect heterogeneity, and treatment choice varies only with U_i. Under fully observed heterogeneity with K_j > 0, decision-makers are at least partly knowledgeable of the sources of treatment effect heterogeneity, with the effect of this knowledge on treatment choice increasing with K_j. Under partially observed heterogeneity, treatment effects are heterogeneous within the reference classes defined by the observed set of patient factors. Partially observed heterogeneity with K_j = 0 has been dubbed nonessential heterogeneity in the econometric literature [38, 39]. Under nonessential heterogeneity, treatment choice is not influenced by the unmeasured patient factors affecting treatment effectiveness within a reference class. Scenarios with partially observed heterogeneity and K_j > 0 represent essential heterogeneity. In these scenarios, treatment effects are heterogeneous within each reference class, with the influence of treatment effect heterogeneity on treatment choice increasing with K_j across simulations.

Estimation methods

Simulated population summaries

Treatment effect estimation using observational data requires what is called a common area of support or overlap between treated and untreated patients or that patients with the same measured patient factors must be observed to make different treatment choices [102, 103]. It has been shown that including patients in study populations with insufficient overlap can lead to biased treatment effect estimates [104, 105]. The treatment choice-based simulations used here naturally reduce overlap the more that treatment choice is influenced by patient factors affecting treatment effectiveness. To monitor this influence across simulations, we used the SAS PROC LOGISTIC procedure to estimate the treatment propensity score for each patient in each simulated population under both “fully observed heterogeneity” and “partially observed heterogeneity”. Each simulated patient was then designated into either the “overlapped” subset with a propensity score between 0.05 and 0.95 or into the nonoverlapped subset with propensity scores either less than 0.05 or greater than 0.95 [104, 105]. We then estimated the percentage of patients in each simulated population who were treated, untreated, overlapped and treated, overlapped and untreated, nonoverlapped and treated, and nonoverlapped and untreated and then calculated the true average TE_i in each subset.

Next, for each simulated population, we estimated a linear probability model (LPM) of treatment choice T_i on true TE_i using the SAS PROC REG procedure with the SCORR1 option. This procedure provides the percentage of treatment choice variation within the simulated population that is attributable to variation in the true treatment effect to serve as a measure of the influence of the true treatment effect on treatment choice. Last, we estimated the effect of T_i and S_i on Y_i using a LPM in each simulated population. The parametric treatment effect literature states that the LPM estimator of the parameter on T_i will yield a consistent estimate of the average absolute treatment effect on the treated in each simulated population [43, 48,49,50, 54, 57, 60, 68, 69].

Casual forest algorithm

We then applied the CFA-GRF [24,25,26, 96, 97] using the “grf” package in R [106] to estimate treatment effects for each patient in each simulated population. CFA-GRF evolved from standard classification and regression tree (CART) and random forest ensemble methods [24,25,26, 96, 97]. CART procedures iteratively partition “nodes” of observations within a population into subnodes or “branches” based on measured factors in a manner that maximizes the differences in an outcome across possible branches [97]. A tree is formed by viewing all of the subsequent branches of the study population. The final subnode or leaf on the end of a branch can be thought of as an algorithm-generated ex post reference class for observations with factors matching the leaf. The random forest approach is an ensemble method that generates a “forest” of CART trees through resampling from the study population [96]. The estimated outcome for a single observation is the average outcome across the leaves in the trees in the forest containing that observation. CFA-GRF extends the random forest approach to the goal of estimating the causal effect of a predictor of interest (e.g., a treatment) on an outcome. CFA-GRF partitions observations based on measured factors in a manner that maximizes the expected differences in the estimated treatment effect on an outcome [24,25,26]. For each simulated population, CFA-GRF was run using 4000 trees, minimum leaf sizes of 50 and the “honest” approach suggested by the algorithm creators, in which trees were estimated using a randomly selected 25% of the simulated population [26]. We ran CFA-GRF specifying X_1i, X_2i, X_3i, X_4i, X_5i, X_6i, and S_i in the “fully observed heterogeneity” specification and X_1i, X_2i, X_3i, X_4i, and S_i in the “partially observed heterogeneity” specification. As a result, each patient in each simulated population had two treatment effect estimates. We assessed the properties of these estimates by evaluating their ability to identify average treatment effect parameters for each simulated population and treatment choice-based subsets of the population. We calculated the average absolute and percentage difference between the true treatment effect for each simulated patient (TE_i) and estimated treatment effects for the full population and subsets of population based on treatment choice and propensity score “overlap” status.

Results

Summary information across simulated populations

Table 2 summarizes each simulated population. Column A in Table 2 shows the proportion of treatment effect expectations (ETE_i) shaped by the true effect for each patient (TE_i) in each simulation – K_j from Eq. (3). Column B shows the percentage of treatment choice variation in each simulation explained by TE_i. Columns C and D show the percentage of simulated patients who overlapped or had propensity scores greater than 0.05 and less than 0.95 in the fully observed heterogeneity and partially observed heterogeneity scenarios, respectively. Columns E through J show the true average TE_i for subsets of treated, untreated, overlapped and treated, overlapped and untreated, nonoverlapped and treated, and nonoverlapped and untreated patients, respectively. These columns also show in parentheses the percentage of patients within each subset.

Table 2 Summary information for simulated populations

Full size table

Patient-specific treatment effects (TE_i) do not influence treatment choice in simulation 1, and as a result, the average true TE_i is close to the true population average treatment effect of 0.25 for both treated and untreated patients. Moving from simulations 2 through 11, though, the knowledge of TE_i increases in decision making, and TE_i explains a larger portion of the variation in treatment choice (column B). Under fully observed heterogeneity, all patients are fully overlapped in simulations 1 through 6. The percentage of overlapping patients falls from 97.0% to 68.8% in simulations 7 through 11. Under the partially observed heterogeneity, all patients overlapped across all simulations. Columns E and F show how the greater influence of TE_i on treatment choice leads to sorting on the gain. The average TE_i for the treated patients in Column E increased from 0.250 to 0.329 as K increased from 0 to 1, while the average TE_i for the untreated patients in Column F fell from 0.251 to 0.172 across this range. Columns G through J stratify treated and untreated patients by overlap status under fully observed heterogeneity. The average TE_i of nonoverlapped treated patients (column I) is greater than that of overlapped treated patients (column G). Likewise, the average TE_i of nonoverlapping untreated patients (column J) is less than that of overlapping untreated patients (column H). Column K of Table 2 shows the estimated treatment effect for the full population in each simulation using a linear probability model (LPM). A comparison of these estimates with column E confirms that LPM yields estimates of the average treatment effect on the treated (ATT) [57]. When treatment effects are heterogeneous, LPM estimates appropriately generalize to untreated patients only when TE_i does not influence treatment choice, as in simulation 1 [57].

CFA-GRF results under fully observed heterogeneity

Table 3 contains the average percentage differences between the true treatment effects and individual treatment effect estimates from CFA-GRF for each of the eleven simulated populations under fully observed heterogeneity. Estimates are reported for the full population in each simulation and treatment-choice-based subsets. Table A.1 in the Additional file 1 shows these results in terms of average absolute differences between the true treatment effect values and estimated treatment effects. The percentage differences in Table 3 were calculated using the average true treatment effect for each population subset found in Table 2 and the average absolute differences for each subset in Table A.1. For example, the average percentage difference between the estimated and true treatment effect values for the full population in simulation 1 under fully observed heterogeneity is 100*(-0.0014)/0.25 = -0.56%. Column E of Table 3 shows that under fully observed heterogeneity on average, CFA-GRF produces treatment effect estimates that reflect each population across simulations. However, as treatment choice becomes more responsive to TE_i, CFA-GRF estimates increasingly understate the true treatment effect for treated patients and overstate the true treatment effect for untreated patients. Simulation 1 under fully observed heterogeneity fully satisfies ignorability, and CFA-GRF produces patient-specific treatment effect estimates that on average reflect the true patient treatment effects for the entire population and for both treated and untreated patient subsets. In contrast, in simulation 11, in which decision-makers have full knowledge of TE_i in treatment choice, the treatment effect estimates for treated patients are on average 14.74% lower than the truth, and the estimated treatment effects for untreated patients are on average 30.99% higher than the truth. These percentage differences are not symmetric because untreated patients have a lower average true treatment effect. Columns G to J in simulations 6 through 11 demonstrate that these differences exist for both overlapping and nonoverlapping patients but are more pronounced for nonoverlapping patients.

Table 3 Average Percentage Differences Between the Estimated Treatment Effects and True Treatment Effects from the Causal Forest Algorithm within the Generalized Random Forests Application (CFA-GRF) Under Fully Observed Heterogeneity Across Simulated Populations Which Differ by the Extent That Treatment Effect Influences Treatment Choice

Full size table

CFA-GRF results under partially observed heterogeneity

Table 4 contains the average percentage differences between the true treatment effect values and CFA-GRF treatment effect estimates for each simulated population under partially observed heterogeneity. Under partially observed heterogeneity all patients are overlapped so that the columns G through J found in Table 3 are unnecessary. Under ignorability in simulation 1, CFA-GRF again produces estimates that on average are close to true patient treatment effects for the entire population and for the treated and untreated patient subsets. In simulation 1, CFA-GRF estimates under partially observed heterogeneity had larger standard errors than those under fully observed heterogeneity (see Table A.2). Treatment effects estimated from CFA-GRF for treated patients closely reflect their true values across all eleven simulations. In contrast, CFA-GRF estimates for untreated patients are higher than their true values across simulations 2 through 11, with the differences increasing with the level of TE_i influence on treatment choice. For example, based on the true average treatment effect for untreated patients from Table 2 and the average absolute differences for each population in Table A.1, on average, CFA-GRF estimates for untreated patients are 2.4% greater than their true values in simulation 2 – 100*(0.006)/(0.246)) and 76.3% greater than their true values in simulation 11 – 100*(0.1312)/(0.172). As a result, when TE_i influences treatment choice under partially observed heterogeneity, CFA-GRF estimated treatment effects across the whole population are on average greater than their true values.

Table 4 Average Percentage Differences Between the Estimated Treatment Effects and True Treatment Effects from the Causal Forest Algorithm within the Generalized Random Forests Application (CFA-GRF) Under Partially Observed Heterogeneity Across Simulated Populations Which Differ by the Extent That Treatment Effect Influences Treatment Choice

Full size table

Discussion

Causal forest algorithms (CFAs) have been proposed to estimate patient-specific treatment effect evidence using observational data [23,24,25,26,27,28,29,30,31,32,33, 107]. To apply CFAs, observational databases must contain patients with similar combinations of measured factors who were observed to make different treatment choices. The positive properties of CFAs for estimating patient-specific treatment effects have been established using simulation models under the assumption of ignorability [26,27,28,29, 34,35,36]. Under ignorability, only the treatment variation from unobserved patient factors not associated with treatment effect heterogeneity is available to estimate patient-specific treatment effects. Therefore, it is unknown whether the positive properties of CFAs extend to real-world clinical applications in which patient factors affecting treatment effectiveness also influence treatment choice. In many real-world clinical scenarios it is plausible and likely that observed treatment choices reflect unmeasured patient factors related to expected treatment effectiveness for each patient – a condition defined in econometric literature as essential heterogeneity [38, 39, 43, 48,49,50, 53]. This paper used simulations that varied only by the relationship between treatment effectiveness and treatment choice to assess the impact of essential heterogeneity on the ability of CFAs to estimate patient-specific treatment effects. The causal forest algorithm within the generalized random forests application CFA-GRF has been singled out as most appropriate CFA estimate patient-specific treatment effects and was used here [98]. To tease out the impacts of essential heterogeneity, CFA-GRF estimates were evaluated in settings in which all patient factors associated with treatment effect heterogeneity were fully observed by the researcher and in settings in which the patient factors associated with treatment effect heterogeneity were not fully observed by the researcher.

We replicated the positive properties of CFA-GRF in simulation scenarios under ignorability. CFA-GRF yielded average population-wide estimates and average estimates by patient subsets based on treatment choice under ignorability that were closely aligned with their true values whether heterogeneity was fully or partially observed within the algorithm. As a result, if researchers can make a strong conceptual case a priori that treatment effectiveness is unrelated to treatment choice, they can be confident that CFA-GRF can yield appropriate treatment effect estimates across a population of patients. In simulation scenarios in which decision-makers use patient factors associated with treatment effectiveness in making treatment decisions [38, 39, 43, 48,49,50, 53], the ability of CFA-GRF to identify patient-specific treatment effects varied with the influence that treatment effectiveness had on treatment choice and whether the full range of patient factors associated with treatment effect heterogeneity were observed and specified in the algorithm. When all patient factors affecting treatment effect heterogeneity were fully specified, CFA-GRF produced treatment effect estimates that reflected true treatment effects across each population subset when the influence of treatment effectiveness on treatment choice was low. As this influence increased, however, treatment effect estimates showed increasingly negative bias for treated patients and positive bias for untreated patients. A substantial portion of this bias is likely attributable to nonoverlapping patients becoming a higher percentage of patients as the influence of treatment effectiveness on treatment choice increases. Under partially observed heterogeneity, all patients overlapped in all simulations. CFA-GRF produced estimates that closely reflected the true treatment effect values for treated patients across all levels of influence of treatment effectiveness on treatment choice. In contrast, CFA-GRF estimates for untreated patients were biased high, with the extent of this bias increasing with the level of influence that treatment effectiveness had on treatment choice.

As a result, CFA-GRF estimates of patient-specific treatment effects using observational data must be assessed through the prism of the assumed reasons why patients with similar measured factors in a real-world setting were observed making different treatment choices. This requires researchers to explicitly develop conceptual frameworks of treatment choice to support these assumptions a priori to ensure proper interpretation of treatment effect estimates ex post. The call for treatment choice conceptual frameworks to guide treatment effectiveness research using observational data has long been stated in economics [44, 48, 49, 108,109,110], and the importance of these frameworks is now being more widely appreciated [21, 111, 112]. A conceptual framework of treatment choice should describe the factors thought to influence treatment choice, the relationship of these factors to treatment effectiveness and whether these factors are measured within the available data. Given the study findings, it would be important for researchers to qualify patient-specific estimates from CFA-GRF in clinical scenarios in which essential heterogeneity likely exists. In these scenarios researchers should state that patient-specific estimates from CFA-GRF are likely biased high for the average patient with a given combination measured patient factors and are best aligned to those patients a provider is more likely to treat.

This study is limited by its use of only using one of the several CFAs available to produce patient-specific evidence using observational data. While the CFA-GRF was singled out as most appropriate for estimating patient-specific treatment effects [98], it is possible that other CFAs are available that can incorporate and correct for the conditions associated with treatment choice when making treatment effect estimates. To this end, the simulated datasets produced here are available from the authors for use by other CFA developers to assess the impact on treatment effect estimates of the influence of treatment effect heterogeneity on treatment choice. In addition, the simulation approach in this paper is reported fully, is straightforward to reproduce, and is easy to modify, so researchers can assess the robustness of our results to parameter changes.

Conclusion

The acknowledged breadth of treatment effect heterogeneity across patients heightens the need to find empirical approaches to find patient-specific treatment effect evidence [4,5,6,7,8,9,10]. Causal forest algorithms (CFAs) have been proposed to analyze the treatment variation found within large observational databases to develop patient-specific evidence [23,24,25,26,27,28,29,30,31,32,33]. The simulation results in this paper show that the patient-specific estimates produced by a CFA are sensitive to the reasons why patients with the same set of measured factors were observed to make different treatment choices. It is likely in many real-world clinical scenarios that decision-makers are cognizant of how patient factors affect treatment effectiveness and use this information in making treatment decisions [38, 39, 43, 48,49,50, 53]. And many real-world decision makers may know more about the list of patient factors affecting treatment effectiveness than the researchers who collect measures for research [22, 113, 114]. As a result, it is foundational that researchers using CFAs to estimate patient-specific evidence using observational data build conceptual frameworks of treatment choice prior to estimation to guide estimate interpretation ex post.

Availability of data and materials

No datasets were generated or analysed during the current study.

Abbreviations

CFA:: Causal forest algorithm
ATT:: Average treatment effect on the treated
DAG:: Directed acyclic graph
CART:: Classification and regression tree
CFA-GRF:: Causal forest algorithm - generalized random forest application

References

Patient Centered Outcomes Research Institute. Our Programs. https://www.pcori.org/about-us/our-programs. Published 2017. Accessed 20 Mar 2019.
Selby JV, Whitlock EP, Sherman KS, Slutsky JR. The Role of Comparative Effectiveness Research. In: Gallin JL, Ognibene FP, Johnson LL, editors. Principles and Practice of Clinical Research. 4th ed. London, UK: Elisevier; 2018. p. 269–92.
Chapter Google Scholar
Selby JV, Beal AC, Frank L. The Patient-Centered Outcomes Research Institute (PCORI) national priorities for research and initial research agenda. Jama-J Am Med Assoc. 2012;307(15):1583–4.
Article CAS Google Scholar
Kravitz RL, Duan N, Braslow J. Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank Q. 2004;82(4):661–87.
Article PubMed PubMed Central Google Scholar
Lohr KN, Eleazer K, Mauskopf J. Health policy issues and applications for evidence-medicine and clinical practice guidelines. Health Policy. 1998;46:1–19.
Article CAS PubMed Google Scholar
Rothwell PM. Subgroup analysis in randomized controlled trials: importance, indications, and interpretation. Lancet. 2005;365:176–86.
Article PubMed Google Scholar
Starfield B. Threads and yarns: weaving the tapestry of comorbidity. Ann Fam Med. 2006;4(2):101–3.
Article PubMed PubMed Central Google Scholar
Steinberg EP, Luce BR. Evidence based? Caveat emptor! Health Affair. 2005;24(1):80–92.
Article Google Scholar
Upshur REG. Looking for rules in a world of exceptions. Perspect Biol Med. 2005;48(4):477–89.
Article PubMed Google Scholar
Dubois RW. From methods to policy: a “one-size-fits-all” policy ignores patient heterogeneity. J Comp Eff Res. 2012;1(2):119–20.
Article PubMed Google Scholar
Kent DM, Paulus JK, van Klaveren D, et al. The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement. Ann Intern Med. 2020;172(1):35–45.
Article PubMed Google Scholar
Deaton A, Cartwright N. Understanding and misunderstanding randomized controlled trials. Soc Sci Med. 2018;210:2–21.
Article PubMed Google Scholar
Concato J, Horwitz RI. Randomized trials and evidence in medicine: A commentary on deaton and cartwright. Soc Sci Med. 2018;210:32–6.
Article PubMed Google Scholar
Rekkas A, Paulus JK, Raman G, et al. Predictive approaches to heterogeneous treatment effects: a scoping review. BMC Med Res Methodol. 2020;20(1):264.
Article PubMed PubMed Central Google Scholar
Sox HC, Goodman SN. The methods of comparative effectiveness research. Annu Rev Publ Health. 2012;33:425–45.
Article Google Scholar
Kowalski CJ, Mrdjenovich AJ. Comparative effectiveness research: decision-based evidence. Perspect Biol Med. 2014;57(2):224–48.
Article PubMed Google Scholar
Dahabreh IJ, Hayward R, Kent DM. Using group data to treat individuals: understanding heterogeneous treatment effects in the age of precision medicine and patient-centred evidence. Int J Epidemiol. 2016;45(6):2184–93.
PubMed PubMed Central Google Scholar
Kent DM, Steyerberg E, van Klaveren D. Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects. BMJ. 2018;363:k4245.
Article PubMed PubMed Central Google Scholar
Kent DM, van Klaveren D, Paulus JK, et al. The Predictive Approaches to Treatment effect Heterogeneity (PATH) statement: explanation and elaboration. Ann Intern Med. 2020;172(1):W1–25.
Article PubMed Google Scholar
Wiemken TL, Kelley RR. Machine learning in epidemiology and health outcomes research. Annu Rev Public Health. 2020;41:21–36.
Article PubMed Google Scholar
Crown WH. Real-world evidence, causal inference, and machine learning. Value Health. 2019;22(5):587–92.
Article PubMed Google Scholar
Dekkers OM, Mulder JM. When will individuals meet their personalized probabilities? A philosophical note on risk prediction. Eur J Epidemiol. 2020;35(12):1115–21.
Article PubMed Google Scholar
Athey S. Beyond prediction: using big data for policy problems. Science. 2017;355(6324):483–5.
Article CAS PubMed ADS Google Scholar
Athey S, Tibshirani J, Wager S. Generalized random forests. Ann Stat. 2019;47(2):1148–78.
Article MathSciNet Google Scholar
Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci. 2016;113(27):7353–60.
Article MathSciNet CAS PubMed PubMed Central ADS Google Scholar
Wager S, Athey S. Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc. 2018;113(523):1228–42.
Article MathSciNet CAS Google Scholar
Bargagli-Stoffi FJ, De-Witte K, Gnecco G. Heterogeneous causal effects with imperfect compliance: a novel Bayesian machine learning approach. arXiv preprint arXiv:190512707. 2019.
Stoffi FJB, Gnecco G. Estimating heterogeneous causal effects in the presence of irregular assignment mechanisms. Paper presented at: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)2018.
Johnson M, Cao J, Kang H. Detecting heterogeneous treatment effect with instrumental variables. arXiv preprint arXiv:190803652. 2019.
Bargagli-Stoffi FJ, Gnecco G. Causal tree with instrumental variable: an extension of the causal tree framework to irregular assignment mechanisms. Int J Data Sci Analytics. 2020;9(3):315–37.
Article Google Scholar
Wang G, Li J, Hopp W, J. An Instrumental Variable Forest Approach for Detecting Heterogeneous Treatment Effects in Observational Studies. Management Science. 2021;https://doi.org/10.1287/mnsc.2021.4084.
Dusseldorp E, Doove L, Mechelen I. Quint: An R package for the identification of subgroups of clients who differ in which treatment alternative is best for them. Behav Res Methods. 2016;48(2):650–63.
Article PubMed Google Scholar
Su XG, Tsai CL, Wang HS, Nickerson DM, Li BG. Subgroup analysis via recursive partitioning. J Mach Learn Res. 2009;10:141–58.
Google Scholar
Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. P Natl Acad Sci USA. 2016;113(27):7353–60.
Article MathSciNet CAS ADS Google Scholar
Wendling T, Jung K, Callahan A, Schuler A, Shah NH, Gallego B. Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases. Stat Med. 2018;37(23):3309–24.
Article MathSciNet CAS PubMed Google Scholar
Hahn PR, Dorie V, Murray JS. Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge 2017. 2019:arXiv:1905.09515. https://doi.org/10.48550/arXiv.1905.09515. Accessed 1 May 2019.
Jawadekar N, Kezios K, Odden MC, et al. Practical guide to honest causal forests for identifying heterogeneous treatment effects. Am J Epidemiol. 2023;192(7):1155–65.
Article PubMed Google Scholar
Basu A, Heckman JJ, Navarro-Lozano S, Urzua S. Use of instrumental variables in the presence of heterogeneity and self-selection: an application to treatments of breast cancer patients. Health Econ. 2007;16(11):1133–57.
Article PubMed Google Scholar
Heckman JJ, Urzua S, Vytlacil E. Understanding instrumental variables in models with essential heterogeneity. Rev Econ Stat. 2006;88(3):389–432.
Article Google Scholar
Basu A. Estimating Decision-Relevant Comparative Effects Using Instrumental Variables. Stat Biosci. 2011;3(1):6–27.
Article Google Scholar
Ravallion M. On the implications of essential heterogeneity for estimating causal impacts using social experiments. J Econ Methods. 2015;4(1):145–51.
MathSciNet Google Scholar
Heckman J, Pinto R. The econometric model for causal policy analysis. Annu Rev Econom. 2022;14(1):893–923.
Article PubMed PubMed Central Google Scholar
Brooks JM, Chapman CG, Schroeder MC. Understanding treatment effect estimates when treatment effects are heterogeneous for more than one outcome. Appl Health Econ Health Policy. 2018;16(3):381–93.
Article PubMed PubMed Central Google Scholar
Heckman JJ. Econometric causality. Int Stat Rev. 2008;76(1):1–27.
Article Google Scholar
Heckman JJ, Vytlacil E. Structural equations, treatment effects, and econometric policy evaluation. Econometrica. 2005;73(3):669–738.
Article MathSciNet Google Scholar
Heckman JJ, Vytlacil EJ. Local instrumental variables and latent variable models for identifying and bounding treatment effects. Proceed National Acad Sci United States. 1999;96(8):4730–4.
Article MathSciNet CAS ADS Google Scholar
Basu A. Person-centered treatment (PeT) effects: Individualized treatment effects using instrumental variables. Stata J. 2015;15(2):397–410.
Article Google Scholar
Brooks JM, Fang G. Interpreting treatment-effect estimates with heterogeneity and choice: simulation model results. Clin Ther. 2009;31(4):902–19.
Article PubMed Google Scholar
Garrido MM, Dowd B, Hebert PL, Maciejewski ML. Understanding treatment effect terminology in pain and symptom management research. J Pain Symptom Manage. 2016;52(3):446–52.
Article PubMed PubMed Central Google Scholar
Smith J, Sweetman A. Viewpoint: estimating the causal effects of policies and programs. Can J Econ. 2016;49(3):871–905.
Article Google Scholar
Heckman JJ. Micro data, heterogeneity, and the evaluation of public policy: nobel lecture. J Polit Econ. 2001;109(4):673–748.
Article Google Scholar
Angrist JD, Pischke J-S. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, New Jersey: Princeton University Press; 2009.
Book Google Scholar
Chapman CG, Brooks JM. Treatment effect estimation using nonlinear two-stage instrumental variable estimators: another cautionary note. Health Serv Res. 2016;51(6):2375–94.
Article PubMed PubMed Central Google Scholar
Brooks JM, Chrischilles EA. Heterogeneity and the interpretation of treatment effect estimates from risk adjustment and instrumental variable methods. Med Care. 2007;45(10 Supl 2):123–30.
Article Google Scholar
Angrist JD, Ferandez-Val I. ExtrapoLATE-ing: External Validity and Overidentification in the LATE Framework. In: Acemoglu D, Arellano M, Dekel E, eds. Advances in Economics and Econometrics, Vol Iii: Econometrics.2013:401–433.
Angrist JD. Treatment effect heterogeneity in theory and practice. Econ J. 2004;114:C52–83.
Article Google Scholar
Heckman JJ, Robb R. Alternative Methods for Evaluating the Impact of Interventions. In: Heckman JJ, Singer B, editors. Longitudinal Analysis of Labor Market Data. New York: Cambridge University Press; 1985. p. 156–245.
Chapter Google Scholar
Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994;62(2):467–75.
Article Google Scholar
Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Am Stat Assoc. 1996;91(434):444–55.
Article Google Scholar
Angrist JD. Estimation of limited dependent variable models with dummy endogenous regressors: simple strategies for empirical practice. J Business Econ Statistics. 2001;19(1):2–16.
Article MathSciNet Google Scholar
Moler-Zapata S, Grieve R, Basu A, O’Neill S. How does a local instrumental variable method perform across settings with instruments of differing strengths? A simulation study and an evaluation of emergency surgery. Health Econ. 2023;32(9):2113–26.
Article PubMed Google Scholar
Brooks JM, Chapman CG, Cozad MJ. The identification process using choice theory is needed to match design with objectives in CER. Med Care. 2017;55(2):91–3.
Article PubMed Google Scholar
Cozad MJ, Chapman CG, Brooks JM. Specifying a conceptual treatment choice relationship before analysis is necessary for comparative effectiveness research. Med Care. 2016;55(2):94–6.
Article Google Scholar
Heckman JJ. The scientific model of causality. Sociol Methodol. 2005;35:1–97.
Article Google Scholar
Angrist JD. Treatment effect heterogeneity in theory and practice. Econ J. 2003;114:1–30.
Google Scholar
Manski CF. [Choices as an alternative to control in observational studies]: comment. Stat Sci. 1999;14(3):279–81.
Google Scholar
Harris KM, Remler DK. Who is the marginal patient? understanding instrumental variables estimates of treatment effects. Health Serv Res. 1998;33(5):1337–60.
CAS PubMed PubMed Central Google Scholar
Heckman JJ, Robb R. Alternative methods for evaluating the impact of interventions - an overview. J Econ. 1985;30(1–2):239–67.
Google Scholar
Blundell R, Costa DM. Evaluation methods for non-experimental data. Fisc Stud. 2000;21(4):427–68.
Article Google Scholar
Smith J. Treatment effect heterogeneity. Eval Rev. 2022;46(5):652–77.
Article PubMed Google Scholar
Brooks JM, Chrischilles EA. Heterogeneity and the interpretation of treatment effect estimates from risk adjustment and instrumental variable methods. Med Care. 2007;45(10):S123–30.
Article PubMed Google Scholar
Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.
Article MathSciNet Google Scholar
Jayakumar P, Teunis T, Williams M, Lamb SE, Ring D, Gwilym S. Factors associated with the magnitude of limitations during recovery from a fracture of the proximal humerus predictors of limitations after proximal humerus fracture. Bone Joint J. 2019;101(6):715–23.
Article PubMed Google Scholar
Otlans PT, Szukics PF, Bryan ST, Tjoumakaris FP, Freedman KB. Current concepts review resilience in the orthopaedic patient. J Bone Joint Surg-Am. 2021;103(6):549–59.
Article PubMed Google Scholar
Ezeamama AE, Elkins J, Simpson C, Smith SL, Allegra JC, Miles TP. Indicators of resilience and healthcare outcomes: findings from the 2010 health and retirement survey. Qual Life Res. 2016;25(4):1007–15.
Article PubMed Google Scholar
Floyd SB, Walker JT, Smith JT, et al. ICD-10 diagnosis codes in electronic health records do not adequately capture fracture complexity for proximal humerus fractures. J Shoulder Elbow Surg. 2023;33(2):417–24.
Article PubMed Google Scholar
Floyd SB, Thigpen C, Kissenberth M, Brooks JM. Association of surgical treatment with adverse events and mortality among medicare beneficiaries with proximal humerus fracture. JAMA Netw Open. 2020;3(1):e1918663.
Article PubMed PubMed Central Google Scholar
Brooks JM, Chapman CG, Floyd SB, Chen BK, Thigpen CA, Kissenberth M. Assessing the ability of an instrumental variable causal forest algorithm to personalize treatment evidence using observational data: the case of early surgery for shoulder fracture. BMC Med Res Methodol. 2022;22(1):190.
Article PubMed PubMed Central Google Scholar
Curran GM, Bauer M, Mittman B, Pyne JM, Stetler C. Effectiveness-implementation hybrid designs combining elements of clinical effectiveness and implementation research to enhance public health impact. Med Care. 2012;50(3):217–26.
Article PubMed PubMed Central Google Scholar
Landes SJ, McBain SA, Curran GM. An introduction to effectiveness-implementation hybrid designs. Psychiatry Res. 2019;280:112513.
Article PubMed PubMed Central Google Scholar
Curran GM, Landes SJ, McBain SA, et al. Reflections on 10 years of effectiveness-implementation hybrid studies. Front Health Serv. 2022;2:1053496.
Article PubMed PubMed Central Google Scholar
Wolfenden L, Williams CM, Wiggers J, Nathan N, Yoong SL. Improving the translation of health promotion interventions using effectiveness–implementation hybrid designs in program evaluations. Health Promot J Austr. 2016;27(3):204–7.
Article PubMed Google Scholar
Bernet AC, Willens DE, Bauer MS. Effectiveness-implementation hybrid designs: implications for quality improvement science. Implement Sci. 2013;8(1):S2.
Article PubMed Central Google Scholar
Ullman AJ, Beidas RS, Bonafide CP. Methodological progress note: Hybrid effectiveness-implementation clinical trials. J Hosp Med. 2022;17(11):912–6.
Article PubMed Google Scholar
Liang YY, Ehler BR, Hollenbeak CS, Turner BJ. Behavioral support intervention for uncontrolled hypertension a Complier Average Causal Effect (CACE) Analysis. Med Care. 2015;53(2):E9–15.
Article PubMed Google Scholar
Peugh JL, Strotman D, McGrady M, Rausch J, Kashikar-Zuck S. Beyond intent to treat (ITT): a complier average causal effect (CACE) estimation primer. J School Psychol. 2017;60:7–24.
Article Google Scholar
Knox CR, Lall R, Hansen Z, Lamb SE. Treatment compliance and effectiveness of a cognitive behavioural intervention for low back pain: a complier average causal effect approach to the BeST data set. Bmc Musculoskeletal Dis. 2014;15:1–1.
Article Google Scholar
Berg JK, Bradshaw CP, Jo B, Ialongo NS. Using Complier average causal effect estimation to determine the impacts of the good behavior game preventive intervention on teacher implementers. Adm Policy Ment Health. 2017;44(4):558–71.
Article PubMed Google Scholar
Gruber JS, Arnold BF, Reygadas F, Hubbard AE, Colford JM Jr. Estimation of treatment efficacy with complier average causal effects (CACE) in a randomized stepped wedge trial. Am J Epidemiol. 2014;179(9):1134–42.
Article PubMed Google Scholar
Connell AM. Employing complier average causal effect analytic methods to examine effects of randomized encouragement trials. Am J Drug Alcohol Abuse. 2009;35(4):253–9.
Article PubMed PubMed Central Google Scholar
Ashworth E, Panayiotou M, Humphrey N, Hennessey A. Game on-complier average causal effect estimation reveals sleeper effects on academic attainment in a randomized trial of the good behavior game. Prev Sci. 2020;21(2):222–33.
Article PubMed PubMed Central Google Scholar
Panayiotou M, Humphrey N, Hennessey A. implementation matters: using complier average causal effect estimation to determine the impact of the promoting alternative thinking strategies (PATHS) curriculum on children’s quality of life. J Educ Psychol. 2020;112(2):236–53.
Article Google Scholar
Carmody T, Greer TL, Walker R, Rethorst CD, Trivedi MH. A complier average causal effect analysis of the stimulant reduction intervention using dosed exercise study. Cont Clin Trial Comm. 2018;10:1–8.
Article Google Scholar
Huang S, Cordova D, Estrada Y, Brincks AM, Asfour LS, Prado G. An application of the complier average causal effect analysis to examine the effects of a family intervention in reducing illicit drug use among high-risk hispanic adolescents. Fam Process. 2014;53(2):336–47.
Article PubMed Google Scholar
Cowan JM. School choice as a latent variable: Estimating the “complier average causal effect” of vouchers in Charlotte. Policy Stud J. 2008;36(2):301–15.
Article Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Article Google Scholar
Breiman L, Friedman J, Olshen RA, Stone CJ. Classification and Regression Trees. CRC Press; 1984.
McConnell KJ, Lindner S. Estimating treatment effects with machine learning. Health Serv Res. 2019;54(6):1273–82.
Article PubMed PubMed Central Google Scholar
Roy AD. Some thoughts on the distribution of earnings. Oxford Econ Pap. 1951;3(2):135–46.
Article Google Scholar
Weinberg CR. Can DAGs clarify effect modification? Epidemiology. 2007;18(5):569–72.
Article PubMed PubMed Central Google Scholar
Attia J, Holliday E, Oldmeadow C. A proposal for capturing interaction and effect modification using DAGs. Int J Epidemiol. 2022;51(4):1047–53.
Article PubMed PubMed Central Google Scholar
Austin PC. An Introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011;46(3):399–424.
Article PubMed PubMed Central Google Scholar
Walker AM, Patrick AR, Lauer MS, et al. A tool for assessing the feasibility of comparative effectiveness research. Comparative Effect Res. 2013;3:11–20.
Article Google Scholar
Sturmer T, Rothman KJ, Avorn J, Glynn RJ. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution–a simulation study. Am J Epidemiol. 2010;172(7):843–54.
Article PubMed PubMed Central Google Scholar
Sturmer T, Webster-Clark M, Lund JL, et al. Propensity score weighting and trimming strategies for reducing variance and bias of treatment effect estimates: a simulation study. Am J Epidemiol. 2021;190(8):1659–70.
Article PubMed PubMed Central Google Scholar
Tibshirani J, Athey S, Sverdrup E, Wager S. instrumental_forest: Instrumental Forest. https://rdrr.io/cran/grf/man/instrumental_forest.html. Published 2021. Accessed 15 May 2021.
Sadique Z, Grieve R, Diaz-Ordaz K, Mouncey P, Lamontagne F, O’Neill S. A machine-learning approach for estimating subgroup- and individual-level treatment effects: an illustration using the 65 trial. Med Decis Making. 2022;42(7):923–36.
Article PubMed PubMed Central Google Scholar
Cozad MJ, Chapman CG, Brooks JM. Specifying a conceptual treatment choice relationship before analysis is necessary for comparative effectiveness research. Med Care. 2017;55(2):94–6.
Article PubMed Google Scholar
Lewbel A. The identification zoo: meanings of identification in econometrics. J Econ Lit. 2019;57(4):835–903.
Article Google Scholar
Heckman JJ. Building bridges between structural and program evaluation approaches to evaluating policy. J Econ Lit. 2010;48(2):356–98.
Article PubMed PubMed Central Google Scholar
Ho M, van der Laan M, Lee H, et al. The current landscape in biostatistics of real-world data and evidence: causal inference frameworks for study design and analysis. Statistics Biopharmaceut Res. 2021;15:1–14.
Google Scholar
VanderWeele TJ, Mathur MB. Commentary: developing best-practice guidelines for the reporting of E-values. Int J Epidemiol. 2020;49(5):1495–7.
Article PubMed PubMed Central Google Scholar
Lesko CR, Henderson NC, Varadhan R. Considerations when assessing heterogeneity of treatment effect in patient-centered outcomes research. J Clin Epidemiol. 2018;100:22–31.
Article PubMed PubMed Central Google Scholar
Wilkinson J, Arnold KF, Murray EJ, et al. Time to reality check the promises of machine learning-powered precision medicine. Lancet Digit Health. 2020;2(12):e677–80.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors acknowledge the support of the University of South Carolina Big Data Health Science Center and the Center for Effectiveness Research in Orthopaedics.

Funding

This project was generously funded by a grant from the University of South Carolina Big Data Health Science Center and focused funding from the Center for Effectiveness Research in Orthopaedics.

Author information

Authors and Affiliations

Center for Effectiveness Research in Orthopaedics - Arnold School of Public Health Greenville, 915 Greene Street #302D, Columbia, SC, 29208-0001, USA
John M. Brooks
University of South Carolina Arnold School of Public Health, Health Services Policy & Management, Columbia, SC, USA
John M. Brooks & Brian K. Chen
Department of Pharmacy Practice and Science Iowa City, University of Iowa, Iowa, USA
Cole G. Chapman
Center for Effectiveness Research in Orthopaedics, Greenville, SC, USA
Cole G. Chapman, Brian K. Chen, Sarah B. Floyd & Neset Hikmet
Clemson University College of Behavioral Social and Health Sciences, Public Health Sciences, Clemson, South Carolina, USA
Sarah B. Floyd
Department of Integrated Information Technology, Innovation Think Tank Lab @ USC, University of South Carolina College of Engineering and Computing, Columbia, SC, USA
Neset Hikmet

Authors

John M. Brooks
View author publications
You can also search for this author in PubMed Google Scholar
Cole G. Chapman
View author publications
You can also search for this author in PubMed Google Scholar
Brian K. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Sarah B. Floyd
View author publications
You can also search for this author in PubMed Google Scholar
Neset Hikmet
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JMB created the simulation scenarios in the paper with conceptual and programming guidance from CGC, BKC, SF, and NH. JMB wrote the first draft of the manuscript with BKC, CGC, SF, and NH providing key insightful editorial changes in focus and direction.

Corresponding author

Correspondence to John M. Brooks.

Ethics declarations

Ethics approval approval and consent to participate

This study uses simulated data with no human interaction. As such, this study was designated “exempt” by the University of South Carolina Institutional Review Board under Category 4 of 45 CFR 46.101(2)(b). All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication

Not applicable. Because this study had no human interaction, informed consent was deemed unnecessary according to national regulations by the University of South Carolina Institutional Review Board.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Material 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Brooks, J.M., Chapman, C.G., Chen, B.K. et al. Assessing the properties of patient-specific treatment effect estimates from causal forest algorithms under essential heterogeneity. BMC Med Res Methodol 24, 66 (2024). https://doi.org/10.1186/s12874-024-02187-5

Download citation

Received: 28 November 2023
Accepted: 21 February 2024
Published: 13 March 2024
DOI: https://doi.org/10.1186/s12874-024-02187-5

Assessing the properties of patient-specific treatment effect estimates from causal forest algorithms under essential heterogeneity

Abstract

Background

Methods

Results

Conclusions

Introduction

Methodological background

Methods

Simulation model

Estimation methods

Simulated population summaries

Casual forest algorithm

Results

Summary information across simulated populations

CFA-GRF results under fully observed heterogeneity

CFA-GRF results under partially observed heterogeneity

Discussion

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Supplementary Material 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Research Methodology

Contact us