 Research
 Open access
 Published:
Assessing the properties of patientspecific treatment effect estimates from causal forest algorithms under essential heterogeneity
BMC Medical Research Methodology volume 24, Article number: 66 (2024)
Abstract
Background
Treatment variation from observational data has been used to estimate patientspecific treatment effects. Causal Forest Algorithms (CFAs) developed for this task have unknown properties when treatment effect heterogeneity from unmeasured patient factors influences treatment choice – essential heterogeneity.
Methods
We simulated eleven populations with identical treatment effect distributions based on patient factors. The populations varied in the extent that treatment effect heterogeneity influenced treatment choice. We used the generalized random forest application (CFAGRF) to estimate patientspecific treatment effects for each population. Average differences between true and estimated effects for patient subsets were evaluated.
Results
CFAGRF performed well across the population when treatment effect heterogeneity did not influence treatment choice. Under essential heterogeneity, however, CFAGRF yielded treatment effect estimates that reflected true treatment effects only for treated patients and were on average greater than true treatment effects for untreated patients.
Conclusions
Patientspecific estimates produced by CFAs are sensitive to why patients in realworld practice make different treatment choices. Researchers using CFAs should develop conceptual frameworks of treatment choice prior to estimation to guide estimate interpretation ex post.
Introduction
Developing patientspecific treatment effect evidence to guide individualized treatment decisionmaking is a cornerstone of patientcentered care [1,2,3]. The need for patientspecific evidence follows from the acknowledged breadth of outcome variation across patients receiving the same treatment. [4,5,6,7,8,9,10]. This phenomenon is known as treatment effect heterogeneity and is defined as “nonrandom variation in the direction of magnitude of a treatment effect” [11]. With their restrictive inclusion/exclusion criteria, randomized controlled trials cannot generate appropriate patientspecific evidence for many patients [4, 11,12,13,14]. As an alternative, observational data provide treatment variation within the context of realworld practice and a diversity of patients well beyond those evaluated in RCTs [2, 3, 12, 15, 16]. The traditional approach to estimate patientspecific treatment effects using observational data is to use parametric estimators and assign to each patient an estimated treatment effect from a “reference class” of patients [17,18,19,20,21,22]. Reference classes are defined a priori by the researcher based on combinations of measured patient factors that are conceptually associated with treatment effect heterogeneity [17,18,19,20,21,22]. The need to specify reference classes a priori has been described as “the central problem when using group evidence to forecast outcomes (or treatment effects) in individuals” [18]. Even with a small number of measured patient factors, a patient could be placed in many reference classes, leaving it unclear which class is best aligned to the patient [10, 17, 18].
Causal forest algorithms (CFAs) have been proposed to estimate patientspecific treatment effects in a manner that essentially assigns patients to reference classes ex post using information from the data, thereby eliminating the need to assign patients to reference classes a priori [23,24,25,26,27,28,29,30,31,32,33]. Simulation modeling has shown that CFAs can accurately estimate patientspecific treatment effects in scenarios in which treatment effect heterogeneity does not influence treatment choice [24, 26,27,28,29, 34,35,36,37]. However, in many realworld scenarios it is conceivable that unmeasured patient factors associated with treatment effectiveness influence treatment choice. This is called essential heterogeneity or sorting on the gain in the econometrics literature [38,39,40,41,42,43,44,45,46,47,48,49,50,51]. The properties of parametric treatment effect estimators under essential heterogeneity are well known [38,39,40,41,42,43,44,45,46,47,48,49,50,51]. However, the impact of essential heterogeneity on patientspecific treatment effect estimates using CFAs has not been evaluated. In this paper, we contrast the properties of patientspecific treatment effect estimates using the causal forest algorithm within the generalized random forests application (CFAGRF) across simulation scenarios that vary in the extent that unmeasured patient factors associated with treatment effectiveness influence treatment choice.
Methodological background
Assigning patients into appropriate reference classes using observational data either a priori with parametric estimators or ex post through a CFA does not ensure that the resulting treatment effect estimates are appropriate for each patient. The conventional criticism of using observational data to estimate treatment effects is the risk of omitted variable bias in which unmeasured factors with direct effects on study outcomes are distributed differently between treated and untreated patients [52]. However, even if patients were assigned to appropriate reference classes and omitted variable bias risk is mitigated through study design, a single treatment effect estimate for a reference class may not be appropriate for each patient within a class. The econometric literature has shown that parametric estimators yield average treatment effect estimates for patient subsets based on treatment choice [38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67]. Under the assumption of no omitted variable bias, regressionbased estimators yield unbiased estimates of the average treatment effect for the subset patients who chose treatment or the average treatment effect on the treated (ATT) [43, 48,49,50, 54, 57, 60, 68, 69]. Consequently, if treatment choice in an empirical setting was influenced by unmeasured patient factors related to treatment effectiveness – essential heterogeneity – the parametric estimate of ATT for a reference class will overstate the true treatment effects for the untreated patients in the class [39, 49, 50, 70]. Researchers using parametric estimators have learned not to generalize a single parametric treatment effect estimate to all patients in a population [38, 43, 47,48,49,50,51, 53, 55, 56, 58, 59, 61, 67, 70, 71].
In contrast, the properties of estimated patientspecific treatment effects from CFAs under essential heterogeneity have not been explored. Simulation research has demonstrated that CFAs accurately yield patientspecific treatment effects under the broad condition of ignorability [24, 26,27,28,29, 34,35,36]. Ignorability assumes that omitted variable bias does not exist within an empirical setting. However, ignorability also assumes that essential heterogeneity does not exist. These dual assumptions can be described using potential outcome notation. Define Y_{1i} and Y_{0i} as the potential outcomes for patient “i” when treated and untreated, respectively, and (Y_{1i} – Y_{0i}) is the true potential treatment effect for patient “i”. Define T_{i} as the observed treatment choice for patient “i” and X_{i} as the set of measured patient factors available to the researcher. Ignorability is broadly defined as (Y_{1i}, Y_{0i}) \(\perp\) T_{i}  X_{i} or conditional on X_{i}, treatment choice is independent of both potential patient outcomes [72]. As such, ignorability implies the following two distinct assumptions.
Assumption (I.1) says that, within a reference class of patients based on X_{i}, treatment choice is unrelated to untreated potential outcomes across patients. Or stated differently, treatment choice is unrelated to unmeasured patient factors associated with Y_{0i}. Assuming (I.1) eliminates the risk of omitted variable bias in an observational study [52].
Even if assumption (I.1) is true though, treatment effects may remain heterogeneous within a reference class defined by X_{i}. With respect to this heterogeneity, ignorability further assumes:
Assumption (I.2) says that, within a reference class of patients defined by X_{i}, treatment choice within the class is not influenced by unmeasured patient factors associated with treatment effectiveness or there is no essential heterogeneity [38, 39, 45]. If ignorability holds within a reference class defined by X_{i}, only the treatment variation that stems from patient factors unrelated to treatment effectiveness will be used to estimate treatment effects within the class. Consequently, CFA simulation results which assume ignorability provide no guidance on the properties of patientspecific treatment effect estimates in realworld scenarios in which essential heterogeneity is thought to exist a priori. For example, the effectiveness of surgery for patients with shoulder fractures is thought to vary with fracture complexity and patient resiliency, which in turn influence surgery choice [73,74,75,76,77], but fracture complexity and patient resiliency are not measurable in large observational databases such as Medicare claims data [73,74,75,76,77]. A study using a causal forest algorithm to estimate patientspecific surgery effects using Medicare claims data theorized a priori that the resulting estimates should be interpreted in terms of essential heterogeneity, but evidence was not available to guide these interpretations [78]. In addition, understanding influence of essential heterogeneity on CFA estimates is especially relevant to researchers proposing to use CFAs in effectivenessimplementation hybrid study designs in which the promotion of a treatment is randomized to satisfy assumption (I.1) but decision makers still have the discretion to choose among available treatments based on individual patient factors [79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95].
To provide this guidance, this study modified a treatment choicebased simulation method used in previous research to assess the impact of essential heterogeneity on patientspecific treatment effect estimates from a CFA estimator [43, 48, 53]. Eleven patient populations were simulated with the same distribution of true treatment effects drawn from identical distributions of simulated patient factors. All eleven simulations were specified to satisfy assumption (I.1). The simulations varied by plausible differences in the extent to which knowledge of true patientspecific treatment effects influenced treatment choice. We used the causal forest algorithm within the generalized random forests application (CFAGRF) [24,25,26, 96, 97] to estimate patientspecific treatment effects for each simulated population. CFAGRF has been singled out as the most appropriate CFA for estimating patientspecific treatment effects [98]. To tease out the influence of essential heterogeneity, we applied CFAGRF to each simulated population under conditions of (1) fully observed heterogeneity in which all patient factors associated with treatment effect heterogeneity are observed by the researcher and (2) partially observed heterogeneity in which only a subset of the patient factors associated with treatment effect heterogeneity are observed by the researcher. Patientspecific treatment effect estimates from CFAGRF were used to calculate the average absolute and average percentage differences between true and estimated effects for each simulated population and for treatment choicebased population subsets.
Methods
Simulation model
Our simulation model follows the general framework in the essential heterogeneity literature [39, 43, 45, 48, 53, 99]. Figure 1 contains a directed acyclic graph (DAG) illustrating the conceptual framework of treatment effect heterogeneity, treatment choice, and outcome within our simulations. Figure 1 was adapted from standard DAG approaches to reflect patient factors affecting treatment effectiveness and the treatment effect knowledge of the decision maker [100, 101]. Outcome (Y_{i}) equals 1 if patient “i” is cured of the medical condition, and 0 if not cured. P(Y_{i}T_{i},S_{i}) is the probability of cure for patient “i” conditional on treatment choice (T_{i}) and patient severity (S_{i}). Patient cure probability also varies with accumulated other factors (W_{i}). Treatment (T_{i}) equals 1 if the patient receives treatment and 0 otherwise, which we designate as watchful waiting. In all simulations, the true absolute treatment effect for each patient “i” (TE_{i}) on Y_{i} relative to watchful waiting varies with six factors X_{1i}, X_{2i}, X_{3i}, X_{4i}, X_{5i}, and X_{6i} based on the following equation:
X_{1i}, X_{2i}, X_{3i}, X_{4i}, X_{5i}, and X_{6i} are binary variables distributed Bernoulli for each patient with a probability of 0.5. Each β_{x} equals the absolute change in treatment effect if a patient has condition “X” (β_{1} = 0.024, β_{2} = 0.048, β_{3} = 0.071, β_{4} = 0.095, β_{5} = 0.119, β_{6} = 0.143). With these parameter values, simulated patients have true treatment effects ranging from 0 to 0.5 with an average true treatment effect of 0.25 for each simulated population. For example, if the simulated patient factors for patient “i” (X_{1i},X_{2i},X_{3i},X_{4i},X_{5i},X_{6i}) were (1,0,1,0,1,0), then patient “i’s” true TE_{i} was.214 = (0.024 + 0 + 0.071 + 0 + 0.095 + 0). Figure 2 illustrates the identical distribution of simulated treatment effects across all eleven simulations in this study.
The true cure probability relationship for each simulated patient “i” signified by the red arrows in Fig. 1 is as follows:
α_{0} equals the untreated patient cure probability at the mean severity level and was set to 0.1 in all simulations. Patient severity (S_{i}) was specified as a uniformly distributed random variable from 0.5 to 0.5. α_{S} equals the change in untreated patient cure probability for differences in severity level and was set to 0.1 in all simulations. As a result, in each simulated population, watchful waiting patients (T_{i} = 0) had a cure probability ranging from 0.05 to 0.15. Treated patients (T_{i} = 1) had a cure probability ranging from 0.05 to 0.65. All other unmeasured patient factors impacting the probability of a cure are found in (W_{i}).
The green arrows in Fig. 1 describe the treatment choice process that varied across the eleven simulations. In each simulation, it is assumed that the treatment decisionmaker observes X_{1i}, X_{2i}, X_{3i}, X_{4i}, X_{5i}, and X_{6i} and forms an expected treatment effect for patient “i”. The simulations differ by the knowledge available to decision makers of the relationship between the six patient factors and treatment effectiveness, as represented by the expected treatment effect function for simulation “j”:
\({{\text{K}}}_{{\text{j}}} \in \left(\text{0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1}\right)\) is the proportion of patientspecific TE_{i} knowledge used by decision makers in simulation “j” that is distinct from the average population treatment effect. Decision makers are more aware of each patient’s true treatment effect relative to the average population treatment effect as K_{j} increases from 0 to 1 across simulations. For example, in the simulation in which K_{j} = 0, decision makers only have knowledge of the average treatment effect across the population (0.25) when making treatment decisions for each patient. Alternatively, when K_{j} = 1, decision makers have exact knowledge of the treatment effect for patient “i” from observed X_{1i}, X_{2i}, X_{3i}, X_{4i}, X_{5i}, and X_{6i}. ETE_{ij}(X_{1i}, X_{2i}, X_{3i}, X_{4i}, X_{5i},X_{6i},K_{j}) is used to calculate the expected value of treatment for patient “i” based on the following:
EVT_{i}(ETE_{ij},V,C,U_{i}) sums the expected benefits and detriments (e.g., costs) of treatment relative to watchful waiting for patient “i” that is conditional on knowledge K_{i}, X_{1i},X_{2i},X_{3i},X_{4i},X_{5i},X_{6i}, direct treatment cost C, cure value V, and U_{i} other accumulated factors affecting treatment value, which are independent of treatment effectiveness for patient “i”. ETE_{ij}(X_{1i}, X_{2i}, X_{3i}, X_{4i}, X_{5i},X_{6i},K_{j}) equals the decision maker’s expected change in cure probability from treatment. To focus this study on the impact of essential heterogeneity across simulations, all patients were assigned a cure value V of $800 and a treatment cost C of $200. These values were chosen because they yield simulated population treatment percentages of approximately 50%. V designations of $500 and $1100 were also tried, which yielded different population treatment percentages but did not influence the interpretation of our results relative to the essential heterogeneity. U_{i} is the source of treatment valuation that varies across patients, is unrelated to treatment effectiveness and is unmeasured by the researcher. U_{i} values were assigned to patients from a normal distribution with a mean of zero and a common variance \({\sigma }_{U}^{2}\) across simulations. Furthermore, in all simulations, U_{i} was specified independently of W_{i} so that the differences in unmeasured factors influencing treatment choice had no relationship with the unmeasured factors directly effecting cure so that ignorability assumption (I.1) was satisfied.
In all simulations, decision makers chose treatment for patient “i” if EVT_{i} was positive and watchful waiting if EVT_{i} was negative. In the simulation in which the knowledge of patientspecific treatment effect heterogeneity is zero (K_{j} = 0), only variation in U_{i} leads to different treatment choices across simulated patients. As K_{j} increases across simulations, a larger proportion of the variation in treatment choice variation is attributable to treatment effectiveness or sorting on the gain. Once a treatment was chosen for each patient, cure (Y_{i}) was simulated using a Bernoulli function of P(Y_{i}T,S_{i}) for patient “i”, given T_{i} and S_{i}. Table 1 summarizes the model parameters and values used in the simulations.
To support large sample properties, we generated 50,000 patients in each simulation. The blue arrows in Fig. 1 describe the variables observed by the researcher after each simulation. By varying the knowledge of TE_{i} across simulations with K_{j} and the patient factors observed by the researcher, we can tease out the impacts of essential heterogeneity on patientspecific treatment effect estimates. In each scenario, researchers observe T_{i}, Y_{i}, S_{i}. We designate “fully observed heterogeneity” as the empirical condition in which researchers observe all six patient factors X_{1i}, X_{2i}, X_{3i}, X_{4i}, X_{5i}, and X_{6i}. We designate “partially observed heterogeneity” as the empirical condition in which researchers observe only X_{1i}, X_{2i}, X_{3i}, and X_{4i}. Under fully observed heterogeneity, treatment effects are homogeneous within each reference class spanned by combinations of the complete set of patient factors. When K_{j} = 0, decisionmakers are not knowledgeable of the sources of treatment effect heterogeneity, and treatment choice varies only with U_{i}. Under fully observed heterogeneity with K_{j} > 0, decisionmakers are at least partly knowledgeable of the sources of treatment effect heterogeneity, with the effect of this knowledge on treatment choice increasing with K_{j}. Under partially observed heterogeneity, treatment effects are heterogeneous within the reference classes defined by the observed set of patient factors. Partially observed heterogeneity with K_{j} = 0 has been dubbed nonessential heterogeneity in the econometric literature [38, 39]. Under nonessential heterogeneity, treatment choice is not influenced by the unmeasured patient factors affecting treatment effectiveness within a reference class. Scenarios with partially observed heterogeneity and K_{j} > 0 represent essential heterogeneity. In these scenarios, treatment effects are heterogeneous within each reference class, with the influence of treatment effect heterogeneity on treatment choice increasing with K_{j} across simulations.
Estimation methods
Simulated population summaries
Treatment effect estimation using observational data requires what is called a common area of support or overlap between treated and untreated patients or that patients with the same measured patient factors must be observed to make different treatment choices [102, 103]. It has been shown that including patients in study populations with insufficient overlap can lead to biased treatment effect estimates [104, 105]. The treatment choicebased simulations used here naturally reduce overlap the more that treatment choice is influenced by patient factors affecting treatment effectiveness. To monitor this influence across simulations, we used the SAS PROC LOGISTIC procedure to estimate the treatment propensity score for each patient in each simulated population under both “fully observed heterogeneity” and “partially observed heterogeneity”. Each simulated patient was then designated into either the “overlapped” subset with a propensity score between 0.05 and 0.95 or into the nonoverlapped subset with propensity scores either less than 0.05 or greater than 0.95 [104, 105]. We then estimated the percentage of patients in each simulated population who were treated, untreated, overlapped and treated, overlapped and untreated, nonoverlapped and treated, and nonoverlapped and untreated and then calculated the true average TE_{i} in each subset.
Next, for each simulated population, we estimated a linear probability model (LPM) of treatment choice T_{i} on true TE_{i} using the SAS PROC REG procedure with the SCORR1 option. This procedure provides the percentage of treatment choice variation within the simulated population that is attributable to variation in the true treatment effect to serve as a measure of the influence of the true treatment effect on treatment choice. Last, we estimated the effect of T_{i} and S_{i} on Y_{i} using a LPM in each simulated population. The parametric treatment effect literature states that the LPM estimator of the parameter on T_{i} will yield a consistent estimate of the average absolute treatment effect on the treated in each simulated population [43, 48,49,50, 54, 57, 60, 68, 69].
Casual forest algorithm
We then applied the CFAGRF [24,25,26, 96, 97] using the “grf” package in R [106] to estimate treatment effects for each patient in each simulated population. CFAGRF evolved from standard classification and regression tree (CART) and random forest ensemble methods [24,25,26, 96, 97]. CART procedures iteratively partition “nodes” of observations within a population into subnodes or “branches” based on measured factors in a manner that maximizes the differences in an outcome across possible branches [97]. A tree is formed by viewing all of the subsequent branches of the study population. The final subnode or leaf on the end of a branch can be thought of as an algorithmgenerated ex post reference class for observations with factors matching the leaf. The random forest approach is an ensemble method that generates a “forest” of CART trees through resampling from the study population [96]. The estimated outcome for a single observation is the average outcome across the leaves in the trees in the forest containing that observation. CFAGRF extends the random forest approach to the goal of estimating the causal effect of a predictor of interest (e.g., a treatment) on an outcome. CFAGRF partitions observations based on measured factors in a manner that maximizes the expected differences in the estimated treatment effect on an outcome [24,25,26]. For each simulated population, CFAGRF was run using 4000 trees, minimum leaf sizes of 50 and the “honest” approach suggested by the algorithm creators, in which trees were estimated using a randomly selected 25% of the simulated population [26]. We ran CFAGRF specifying X_{1i}, X_{2i}, X_{3i}, X_{4i}, X_{5i}, X_{6i}, and S_{i} in the “fully observed heterogeneity” specification and X_{1i}, X_{2i}, X_{3i}, X_{4i}, and S_{i} in the “partially observed heterogeneity” specification. As a result, each patient in each simulated population had two treatment effect estimates. We assessed the properties of these estimates by evaluating their ability to identify average treatment effect parameters for each simulated population and treatment choicebased subsets of the population. We calculated the average absolute and percentage difference between the true treatment effect for each simulated patient (TE_{i}) and estimated treatment effects for the full population and subsets of population based on treatment choice and propensity score “overlap” status.
Results
Summary information across simulated populations
Table 2 summarizes each simulated population. Column A in Table 2 shows the proportion of treatment effect expectations (ETE_{i}) shaped by the true effect for each patient (TE_{i}) in each simulation – K_{j} from Eq. (3). Column B shows the percentage of treatment choice variation in each simulation explained by TE_{i}. Columns C and D show the percentage of simulated patients who overlapped or had propensity scores greater than 0.05 and less than 0.95 in the fully observed heterogeneity and partially observed heterogeneity scenarios, respectively. Columns E through J show the true average TE_{i} for subsets of treated, untreated, overlapped and treated, overlapped and untreated, nonoverlapped and treated, and nonoverlapped and untreated patients, respectively. These columns also show in parentheses the percentage of patients within each subset.
Patientspecific treatment effects (TE_{i}) do not influence treatment choice in simulation 1, and as a result, the average true TE_{i} is close to the true population average treatment effect of 0.25 for both treated and untreated patients. Moving from simulations 2 through 11, though, the knowledge of TE_{i} increases in decision making, and TE_{i} explains a larger portion of the variation in treatment choice (column B). Under fully observed heterogeneity, all patients are fully overlapped in simulations 1 through 6. The percentage of overlapping patients falls from 97.0% to 68.8% in simulations 7 through 11. Under the partially observed heterogeneity, all patients overlapped across all simulations. Columns E and F show how the greater influence of TE_{i} on treatment choice leads to sorting on the gain. The average TE_{i} for the treated patients in Column E increased from 0.250 to 0.329 as K increased from 0 to 1, while the average TE_{i} for the untreated patients in Column F fell from 0.251 to 0.172 across this range. Columns G through J stratify treated and untreated patients by overlap status under fully observed heterogeneity. The average TE_{i} of nonoverlapped treated patients (column I) is greater than that of overlapped treated patients (column G). Likewise, the average TE_{i} of nonoverlapping untreated patients (column J) is less than that of overlapping untreated patients (column H). Column K of Table 2 shows the estimated treatment effect for the full population in each simulation using a linear probability model (LPM). A comparison of these estimates with column E confirms that LPM yields estimates of the average treatment effect on the treated (ATT) [57]. When treatment effects are heterogeneous, LPM estimates appropriately generalize to untreated patients only when TE_{i} does not influence treatment choice, as in simulation 1 [57].
CFAGRF results under fully observed heterogeneity
Table 3 contains the average percentage differences between the true treatment effects and individual treatment effect estimates from CFAGRF for each of the eleven simulated populations under fully observed heterogeneity. Estimates are reported for the full population in each simulation and treatmentchoicebased subsets. Table A.1 in the Additional file 1 shows these results in terms of average absolute differences between the true treatment effect values and estimated treatment effects. The percentage differences in Table 3 were calculated using the average true treatment effect for each population subset found in Table 2 and the average absolute differences for each subset in Table A.1. For example, the average percentage difference between the estimated and true treatment effect values for the full population in simulation 1 under fully observed heterogeneity is 100*(0.0014)/0.25 = 0.56%. Column E of Table 3 shows that under fully observed heterogeneity on average, CFAGRF produces treatment effect estimates that reflect each population across simulations. However, as treatment choice becomes more responsive to TE_{i}, CFAGRF estimates increasingly understate the true treatment effect for treated patients and overstate the true treatment effect for untreated patients. Simulation 1 under fully observed heterogeneity fully satisfies ignorability, and CFAGRF produces patientspecific treatment effect estimates that on average reflect the true patient treatment effects for the entire population and for both treated and untreated patient subsets. In contrast, in simulation 11, in which decisionmakers have full knowledge of TE_{i} in treatment choice, the treatment effect estimates for treated patients are on average 14.74% lower than the truth, and the estimated treatment effects for untreated patients are on average 30.99% higher than the truth. These percentage differences are not symmetric because untreated patients have a lower average true treatment effect. Columns G to J in simulations 6 through 11 demonstrate that these differences exist for both overlapping and nonoverlapping patients but are more pronounced for nonoverlapping patients.
CFAGRF results under partially observed heterogeneity
Table 4 contains the average percentage differences between the true treatment effect values and CFAGRF treatment effect estimates for each simulated population under partially observed heterogeneity. Under partially observed heterogeneity all patients are overlapped so that the columns G through J found in Table 3 are unnecessary. Under ignorability in simulation 1, CFAGRF again produces estimates that on average are close to true patient treatment effects for the entire population and for the treated and untreated patient subsets. In simulation 1, CFAGRF estimates under partially observed heterogeneity had larger standard errors than those under fully observed heterogeneity (see Table A.2). Treatment effects estimated from CFAGRF for treated patients closely reflect their true values across all eleven simulations. In contrast, CFAGRF estimates for untreated patients are higher than their true values across simulations 2 through 11, with the differences increasing with the level of TE_{i} influence on treatment choice. For example, based on the true average treatment effect for untreated patients from Table 2 and the average absolute differences for each population in Table A.1, on average, CFAGRF estimates for untreated patients are 2.4% greater than their true values in simulation 2 – 100*(0.006)/(0.246)) and 76.3% greater than their true values in simulation 11 – 100*(0.1312)/(0.172). As a result, when TE_{i} influences treatment choice under partially observed heterogeneity, CFAGRF estimated treatment effects across the whole population are on average greater than their true values.
Discussion
Causal forest algorithms (CFAs) have been proposed to estimate patientspecific treatment effect evidence using observational data [23,24,25,26,27,28,29,30,31,32,33, 107]. To apply CFAs, observational databases must contain patients with similar combinations of measured factors who were observed to make different treatment choices. The positive properties of CFAs for estimating patientspecific treatment effects have been established using simulation models under the assumption of ignorability [26,27,28,29, 34,35,36]. Under ignorability, only the treatment variation from unobserved patient factors not associated with treatment effect heterogeneity is available to estimate patientspecific treatment effects. Therefore, it is unknown whether the positive properties of CFAs extend to realworld clinical applications in which patient factors affecting treatment effectiveness also influence treatment choice. In many realworld clinical scenarios it is plausible and likely that observed treatment choices reflect unmeasured patient factors related to expected treatment effectiveness for each patient – a condition defined in econometric literature as essential heterogeneity [38, 39, 43, 48,49,50, 53]. This paper used simulations that varied only by the relationship between treatment effectiveness and treatment choice to assess the impact of essential heterogeneity on the ability of CFAs to estimate patientspecific treatment effects. The causal forest algorithm within the generalized random forests application CFAGRF has been singled out as most appropriate CFA estimate patientspecific treatment effects and was used here [98]. To tease out the impacts of essential heterogeneity, CFAGRF estimates were evaluated in settings in which all patient factors associated with treatment effect heterogeneity were fully observed by the researcher and in settings in which the patient factors associated with treatment effect heterogeneity were not fully observed by the researcher.
We replicated the positive properties of CFAGRF in simulation scenarios under ignorability. CFAGRF yielded average populationwide estimates and average estimates by patient subsets based on treatment choice under ignorability that were closely aligned with their true values whether heterogeneity was fully or partially observed within the algorithm. As a result, if researchers can make a strong conceptual case a priori that treatment effectiveness is unrelated to treatment choice, they can be confident that CFAGRF can yield appropriate treatment effect estimates across a population of patients. In simulation scenarios in which decisionmakers use patient factors associated with treatment effectiveness in making treatment decisions [38, 39, 43, 48,49,50, 53], the ability of CFAGRF to identify patientspecific treatment effects varied with the influence that treatment effectiveness had on treatment choice and whether the full range of patient factors associated with treatment effect heterogeneity were observed and specified in the algorithm. When all patient factors affecting treatment effect heterogeneity were fully specified, CFAGRF produced treatment effect estimates that reflected true treatment effects across each population subset when the influence of treatment effectiveness on treatment choice was low. As this influence increased, however, treatment effect estimates showed increasingly negative bias for treated patients and positive bias for untreated patients. A substantial portion of this bias is likely attributable to nonoverlapping patients becoming a higher percentage of patients as the influence of treatment effectiveness on treatment choice increases. Under partially observed heterogeneity, all patients overlapped in all simulations. CFAGRF produced estimates that closely reflected the true treatment effect values for treated patients across all levels of influence of treatment effectiveness on treatment choice. In contrast, CFAGRF estimates for untreated patients were biased high, with the extent of this bias increasing with the level of influence that treatment effectiveness had on treatment choice.
As a result, CFAGRF estimates of patientspecific treatment effects using observational data must be assessed through the prism of the assumed reasons why patients with similar measured factors in a realworld setting were observed making different treatment choices. This requires researchers to explicitly develop conceptual frameworks of treatment choice to support these assumptions a priori to ensure proper interpretation of treatment effect estimates ex post. The call for treatment choice conceptual frameworks to guide treatment effectiveness research using observational data has long been stated in economics [44, 48, 49, 108,109,110], and the importance of these frameworks is now being more widely appreciated [21, 111, 112]. A conceptual framework of treatment choice should describe the factors thought to influence treatment choice, the relationship of these factors to treatment effectiveness and whether these factors are measured within the available data. Given the study findings, it would be important for researchers to qualify patientspecific estimates from CFAGRF in clinical scenarios in which essential heterogeneity likely exists. In these scenarios researchers should state that patientspecific estimates from CFAGRF are likely biased high for the average patient with a given combination measured patient factors and are best aligned to those patients a provider is more likely to treat.
This study is limited by its use of only using one of the several CFAs available to produce patientspecific evidence using observational data. While the CFAGRF was singled out as most appropriate for estimating patientspecific treatment effects [98], it is possible that other CFAs are available that can incorporate and correct for the conditions associated with treatment choice when making treatment effect estimates. To this end, the simulated datasets produced here are available from the authors for use by other CFA developers to assess the impact on treatment effect estimates of the influence of treatment effect heterogeneity on treatment choice. In addition, the simulation approach in this paper is reported fully, is straightforward to reproduce, and is easy to modify, so researchers can assess the robustness of our results to parameter changes.
Conclusion
The acknowledged breadth of treatment effect heterogeneity across patients heightens the need to find empirical approaches to find patientspecific treatment effect evidence [4,5,6,7,8,9,10]. Causal forest algorithms (CFAs) have been proposed to analyze the treatment variation found within large observational databases to develop patientspecific evidence [23,24,25,26,27,28,29,30,31,32,33]. The simulation results in this paper show that the patientspecific estimates produced by a CFA are sensitive to the reasons why patients with the same set of measured factors were observed to make different treatment choices. It is likely in many realworld clinical scenarios that decisionmakers are cognizant of how patient factors affect treatment effectiveness and use this information in making treatment decisions [38, 39, 43, 48,49,50, 53]. And many realworld decision makers may know more about the list of patient factors affecting treatment effectiveness than the researchers who collect measures for research [22, 113, 114]. As a result, it is foundational that researchers using CFAs to estimate patientspecific evidence using observational data build conceptual frameworks of treatment choice prior to estimation to guide estimate interpretation ex post.
Availability of data and materials
No datasets were generated or analysed during the current study.
Abbreviations
 CFA:

Causal forest algorithm
 ATT:

Average treatment effect on the treated
 DAG:

Directed acyclic graph
 CART:

Classification and regression tree
 CFAGRF:

Causal forest algorithm  generalized random forest application
References
Patient Centered Outcomes Research Institute. Our Programs. https://www.pcori.org/aboutus/ourprograms. Published 2017. Accessed 20 Mar 2019.
Selby JV, Whitlock EP, Sherman KS, Slutsky JR. The Role of Comparative Effectiveness Research. In: Gallin JL, Ognibene FP, Johnson LL, editors. Principles and Practice of Clinical Research. 4th ed. London, UK: Elisevier; 2018. p. 269–92.
Selby JV, Beal AC, Frank L. The PatientCentered Outcomes Research Institute (PCORI) national priorities for research and initial research agenda. JamaJ Am Med Assoc. 2012;307(15):1583–4.
Kravitz RL, Duan N, Braslow J. Evidencebased medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank Q. 2004;82(4):661–87.
Lohr KN, Eleazer K, Mauskopf J. Health policy issues and applications for evidencemedicine and clinical practice guidelines. Health Policy. 1998;46:1–19.
Rothwell PM. Subgroup analysis in randomized controlled trials: importance, indications, and interpretation. Lancet. 2005;365:176–86.
Starfield B. Threads and yarns: weaving the tapestry of comorbidity. Ann Fam Med. 2006;4(2):101–3.
Steinberg EP, Luce BR. Evidence based? Caveat emptor! Health Affair. 2005;24(1):80–92.
Upshur REG. Looking for rules in a world of exceptions. Perspect Biol Med. 2005;48(4):477–89.
Dubois RW. From methods to policy: a “onesizefitsall” policy ignores patient heterogeneity. J Comp Eff Res. 2012;1(2):119–20.
Kent DM, Paulus JK, van Klaveren D, et al. The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement. Ann Intern Med. 2020;172(1):35–45.
Deaton A, Cartwright N. Understanding and misunderstanding randomized controlled trials. Soc Sci Med. 2018;210:2–21.
Concato J, Horwitz RI. Randomized trials and evidence in medicine: A commentary on deaton and cartwright. Soc Sci Med. 2018;210:32–6.
Rekkas A, Paulus JK, Raman G, et al. Predictive approaches to heterogeneous treatment effects: a scoping review. BMC Med Res Methodol. 2020;20(1):264.
Sox HC, Goodman SN. The methods of comparative effectiveness research. Annu Rev Publ Health. 2012;33:425–45.
Kowalski CJ, Mrdjenovich AJ. Comparative effectiveness research: decisionbased evidence. Perspect Biol Med. 2014;57(2):224–48.
Dahabreh IJ, Hayward R, Kent DM. Using group data to treat individuals: understanding heterogeneous treatment effects in the age of precision medicine and patientcentred evidence. Int J Epidemiol. 2016;45(6):2184–93.
Kent DM, Steyerberg E, van Klaveren D. Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects. BMJ. 2018;363:k4245.
Kent DM, van Klaveren D, Paulus JK, et al. The Predictive Approaches to Treatment effect Heterogeneity (PATH) statement: explanation and elaboration. Ann Intern Med. 2020;172(1):W1–25.
Wiemken TL, Kelley RR. Machine learning in epidemiology and health outcomes research. Annu Rev Public Health. 2020;41:21–36.
Crown WH. Realworld evidence, causal inference, and machine learning. Value Health. 2019;22(5):587–92.
Dekkers OM, Mulder JM. When will individuals meet their personalized probabilities? A philosophical note on risk prediction. Eur J Epidemiol. 2020;35(12):1115–21.
Athey S. Beyond prediction: using big data for policy problems. Science. 2017;355(6324):483–5.
Athey S, Tibshirani J, Wager S. Generalized random forests. Ann Stat. 2019;47(2):1148–78.
Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci. 2016;113(27):7353–60.
Wager S, Athey S. Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc. 2018;113(523):1228–42.
BargagliStoffi FJ, DeWitte K, Gnecco G. Heterogeneous causal effects with imperfect compliance: a novel Bayesian machine learning approach. arXiv preprint arXiv:190512707. 2019.
Stoffi FJB, Gnecco G. Estimating heterogeneous causal effects in the presence of irregular assignment mechanisms. Paper presented at: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)2018.
Johnson M, Cao J, Kang H. Detecting heterogeneous treatment effect with instrumental variables. arXiv preprint arXiv:190803652. 2019.
BargagliStoffi FJ, Gnecco G. Causal tree with instrumental variable: an extension of the causal tree framework to irregular assignment mechanisms. Int J Data Sci Analytics. 2020;9(3):315–37.
Wang G, Li J, Hopp W, J. An Instrumental Variable Forest Approach for Detecting Heterogeneous Treatment Effects in Observational Studies. Management Science. 2021;https://doi.org/10.1287/mnsc.2021.4084.
Dusseldorp E, Doove L, Mechelen I. Quint: An R package for the identification of subgroups of clients who differ in which treatment alternative is best for them. Behav Res Methods. 2016;48(2):650–63.
Su XG, Tsai CL, Wang HS, Nickerson DM, Li BG. Subgroup analysis via recursive partitioning. J Mach Learn Res. 2009;10:141–58.
Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. P Natl Acad Sci USA. 2016;113(27):7353–60.
Wendling T, Jung K, Callahan A, Schuler A, Shah NH, Gallego B. Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases. Stat Med. 2018;37(23):3309–24.
Hahn PR, Dorie V, Murray JS. Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge 2017. 2019:arXiv:1905.09515. https://doi.org/10.48550/arXiv.1905.09515. Accessed 1 May 2019.
Jawadekar N, Kezios K, Odden MC, et al. Practical guide to honest causal forests for identifying heterogeneous treatment effects. Am J Epidemiol. 2023;192(7):1155–65.
Basu A, Heckman JJ, NavarroLozano S, Urzua S. Use of instrumental variables in the presence of heterogeneity and selfselection: an application to treatments of breast cancer patients. Health Econ. 2007;16(11):1133–57.
Heckman JJ, Urzua S, Vytlacil E. Understanding instrumental variables in models with essential heterogeneity. Rev Econ Stat. 2006;88(3):389–432.
Basu A. Estimating DecisionRelevant Comparative Effects Using Instrumental Variables. Stat Biosci. 2011;3(1):6–27.
Ravallion M. On the implications of essential heterogeneity for estimating causal impacts using social experiments. J Econ Methods. 2015;4(1):145–51.
Heckman J, Pinto R. The econometric model for causal policy analysis. Annu Rev Econom. 2022;14(1):893–923.
Brooks JM, Chapman CG, Schroeder MC. Understanding treatment effect estimates when treatment effects are heterogeneous for more than one outcome. Appl Health Econ Health Policy. 2018;16(3):381–93.
Heckman JJ. Econometric causality. Int Stat Rev. 2008;76(1):1–27.
Heckman JJ, Vytlacil E. Structural equations, treatment effects, and econometric policy evaluation. Econometrica. 2005;73(3):669–738.
Heckman JJ, Vytlacil EJ. Local instrumental variables and latent variable models for identifying and bounding treatment effects. Proceed National Acad Sci United States. 1999;96(8):4730–4.
Basu A. Personcentered treatment (PeT) effects: Individualized treatment effects using instrumental variables. Stata J. 2015;15(2):397–410.
Brooks JM, Fang G. Interpreting treatmenteffect estimates with heterogeneity and choice: simulation model results. Clin Ther. 2009;31(4):902–19.
Garrido MM, Dowd B, Hebert PL, Maciejewski ML. Understanding treatment effect terminology in pain and symptom management research. J Pain Symptom Manage. 2016;52(3):446–52.
Smith J, Sweetman A. Viewpoint: estimating the causal effects of policies and programs. Can J Econ. 2016;49(3):871–905.
Heckman JJ. Micro data, heterogeneity, and the evaluation of public policy: nobel lecture. J Polit Econ. 2001;109(4):673–748.
Angrist JD, Pischke JS. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, New Jersey: Princeton University Press; 2009.
Chapman CG, Brooks JM. Treatment effect estimation using nonlinear twostage instrumental variable estimators: another cautionary note. Health Serv Res. 2016;51(6):2375–94.
Brooks JM, Chrischilles EA. Heterogeneity and the interpretation of treatment effect estimates from risk adjustment and instrumental variable methods. Med Care. 2007;45(10 Supl 2):123–30.
Angrist JD, FerandezVal I. ExtrapoLATEing: External Validity and Overidentification in the LATE Framework. In: Acemoglu D, Arellano M, Dekel E, eds. Advances in Economics and Econometrics, Vol Iii: Econometrics.2013:401–433.
Angrist JD. Treatment effect heterogeneity in theory and practice. Econ J. 2004;114:C52–83.
Heckman JJ, Robb R. Alternative Methods for Evaluating the Impact of Interventions. In: Heckman JJ, Singer B, editors. Longitudinal Analysis of Labor Market Data. New York: Cambridge University Press; 1985. p. 156–245.
Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994;62(2):467–75.
Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Am Stat Assoc. 1996;91(434):444–55.
Angrist JD. Estimation of limited dependent variable models with dummy endogenous regressors: simple strategies for empirical practice. J Business Econ Statistics. 2001;19(1):2–16.
MolerZapata S, Grieve R, Basu A, O’Neill S. How does a local instrumental variable method perform across settings with instruments of differing strengths? A simulation study and an evaluation of emergency surgery. Health Econ. 2023;32(9):2113–26.
Brooks JM, Chapman CG, Cozad MJ. The identification process using choice theory is needed to match design with objectives in CER. Med Care. 2017;55(2):91–3.
Cozad MJ, Chapman CG, Brooks JM. Specifying a conceptual treatment choice relationship before analysis is necessary for comparative effectiveness research. Med Care. 2016;55(2):94–6.
Heckman JJ. The scientific model of causality. Sociol Methodol. 2005;35:1–97.
Angrist JD. Treatment effect heterogeneity in theory and practice. Econ J. 2003;114:1–30.
Manski CF. [Choices as an alternative to control in observational studies]: comment. Stat Sci. 1999;14(3):279–81.
Harris KM, Remler DK. Who is the marginal patient? understanding instrumental variables estimates of treatment effects. Health Serv Res. 1998;33(5):1337–60.
Heckman JJ, Robb R. Alternative methods for evaluating the impact of interventions  an overview. J Econ. 1985;30(1–2):239–67.
Blundell R, Costa DM. Evaluation methods for nonexperimental data. Fisc Stud. 2000;21(4):427–68.
Smith J. Treatment effect heterogeneity. Eval Rev. 2022;46(5):652–77.
Brooks JM, Chrischilles EA. Heterogeneity and the interpretation of treatment effect estimates from risk adjustment and instrumental variable methods. Med Care. 2007;45(10):S123–30.
Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.
Jayakumar P, Teunis T, Williams M, Lamb SE, Ring D, Gwilym S. Factors associated with the magnitude of limitations during recovery from a fracture of the proximal humerus predictors of limitations after proximal humerus fracture. Bone Joint J. 2019;101(6):715–23.
Otlans PT, Szukics PF, Bryan ST, Tjoumakaris FP, Freedman KB. Current concepts review resilience in the orthopaedic patient. J Bone Joint SurgAm. 2021;103(6):549–59.
Ezeamama AE, Elkins J, Simpson C, Smith SL, Allegra JC, Miles TP. Indicators of resilience and healthcare outcomes: findings from the 2010 health and retirement survey. Qual Life Res. 2016;25(4):1007–15.
Floyd SB, Walker JT, Smith JT, et al. ICD10 diagnosis codes in electronic health records do not adequately capture fracture complexity for proximal humerus fractures. J Shoulder Elbow Surg. 2023;33(2):417–24.
Floyd SB, Thigpen C, Kissenberth M, Brooks JM. Association of surgical treatment with adverse events and mortality among medicare beneficiaries with proximal humerus fracture. JAMA Netw Open. 2020;3(1):e1918663.
Brooks JM, Chapman CG, Floyd SB, Chen BK, Thigpen CA, Kissenberth M. Assessing the ability of an instrumental variable causal forest algorithm to personalize treatment evidence using observational data: the case of early surgery for shoulder fracture. BMC Med Res Methodol. 2022;22(1):190.
Curran GM, Bauer M, Mittman B, Pyne JM, Stetler C. Effectivenessimplementation hybrid designs combining elements of clinical effectiveness and implementation research to enhance public health impact. Med Care. 2012;50(3):217–26.
Landes SJ, McBain SA, Curran GM. An introduction to effectivenessimplementation hybrid designs. Psychiatry Res. 2019;280:112513.
Curran GM, Landes SJ, McBain SA, et al. Reflections on 10 years of effectivenessimplementation hybrid studies. Front Health Serv. 2022;2:1053496.
Wolfenden L, Williams CM, Wiggers J, Nathan N, Yoong SL. Improving the translation of health promotion interventions using effectiveness–implementation hybrid designs in program evaluations. Health Promot J Austr. 2016;27(3):204–7.
Bernet AC, Willens DE, Bauer MS. Effectivenessimplementation hybrid designs: implications for quality improvement science. Implement Sci. 2013;8(1):S2.
Ullman AJ, Beidas RS, Bonafide CP. Methodological progress note: Hybrid effectivenessimplementation clinical trials. J Hosp Med. 2022;17(11):912–6.
Liang YY, Ehler BR, Hollenbeak CS, Turner BJ. Behavioral support intervention for uncontrolled hypertension a Complier Average Causal Effect (CACE) Analysis. Med Care. 2015;53(2):E9–15.
Peugh JL, Strotman D, McGrady M, Rausch J, KashikarZuck S. Beyond intent to treat (ITT): a complier average causal effect (CACE) estimation primer. J School Psychol. 2017;60:7–24.
Knox CR, Lall R, Hansen Z, Lamb SE. Treatment compliance and effectiveness of a cognitive behavioural intervention for low back pain: a complier average causal effect approach to the BeST data set. Bmc Musculoskeletal Dis. 2014;15:1–1.
Berg JK, Bradshaw CP, Jo B, Ialongo NS. Using Complier average causal effect estimation to determine the impacts of the good behavior game preventive intervention on teacher implementers. Adm Policy Ment Health. 2017;44(4):558–71.
Gruber JS, Arnold BF, Reygadas F, Hubbard AE, Colford JM Jr. Estimation of treatment efficacy with complier average causal effects (CACE) in a randomized stepped wedge trial. Am J Epidemiol. 2014;179(9):1134–42.
Connell AM. Employing complier average causal effect analytic methods to examine effects of randomized encouragement trials. Am J Drug Alcohol Abuse. 2009;35(4):253–9.
Ashworth E, Panayiotou M, Humphrey N, Hennessey A. Game oncomplier average causal effect estimation reveals sleeper effects on academic attainment in a randomized trial of the good behavior game. Prev Sci. 2020;21(2):222–33.
Panayiotou M, Humphrey N, Hennessey A. implementation matters: using complier average causal effect estimation to determine the impact of the promoting alternative thinking strategies (PATHS) curriculum on children’s quality of life. J Educ Psychol. 2020;112(2):236–53.
Carmody T, Greer TL, Walker R, Rethorst CD, Trivedi MH. A complier average causal effect analysis of the stimulant reduction intervention using dosed exercise study. Cont Clin Trial Comm. 2018;10:1–8.
Huang S, Cordova D, Estrada Y, Brincks AM, Asfour LS, Prado G. An application of the complier average causal effect analysis to examine the effects of a family intervention in reducing illicit drug use among highrisk hispanic adolescents. Fam Process. 2014;53(2):336–47.
Cowan JM. School choice as a latent variable: Estimating the “complier average causal effect” of vouchers in Charlotte. Policy Stud J. 2008;36(2):301–15.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Breiman L, Friedman J, Olshen RA, Stone CJ. Classification and Regression Trees. CRC Press; 1984.
McConnell KJ, Lindner S. Estimating treatment effects with machine learning. Health Serv Res. 2019;54(6):1273–82.
Roy AD. Some thoughts on the distribution of earnings. Oxford Econ Pap. 1951;3(2):135–46.
Weinberg CR. Can DAGs clarify effect modification? Epidemiology. 2007;18(5):569–72.
Attia J, Holliday E, Oldmeadow C. A proposal for capturing interaction and effect modification using DAGs. Int J Epidemiol. 2022;51(4):1047–53.
Austin PC. An Introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011;46(3):399–424.
Walker AM, Patrick AR, Lauer MS, et al. A tool for assessing the feasibility of comparative effectiveness research. Comparative Effect Res. 2013;3:11–20.
Sturmer T, Rothman KJ, Avorn J, Glynn RJ. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution–a simulation study. Am J Epidemiol. 2010;172(7):843–54.
Sturmer T, WebsterClark M, Lund JL, et al. Propensity score weighting and trimming strategies for reducing variance and bias of treatment effect estimates: a simulation study. Am J Epidemiol. 2021;190(8):1659–70.
Tibshirani J, Athey S, Sverdrup E, Wager S. instrumental_forest: Instrumental Forest. https://rdrr.io/cran/grf/man/instrumental_forest.html. Published 2021. Accessed 15 May 2021.
Sadique Z, Grieve R, DiazOrdaz K, Mouncey P, Lamontagne F, O’Neill S. A machinelearning approach for estimating subgroup and individuallevel treatment effects: an illustration using the 65 trial. Med Decis Making. 2022;42(7):923–36.
Cozad MJ, Chapman CG, Brooks JM. Specifying a conceptual treatment choice relationship before analysis is necessary for comparative effectiveness research. Med Care. 2017;55(2):94–6.
Lewbel A. The identification zoo: meanings of identification in econometrics. J Econ Lit. 2019;57(4):835–903.
Heckman JJ. Building bridges between structural and program evaluation approaches to evaluating policy. J Econ Lit. 2010;48(2):356–98.
Ho M, van der Laan M, Lee H, et al. The current landscape in biostatistics of realworld data and evidence: causal inference frameworks for study design and analysis. Statistics Biopharmaceut Res. 2021;15:1–14.
VanderWeele TJ, Mathur MB. Commentary: developing bestpractice guidelines for the reporting of Evalues. Int J Epidemiol. 2020;49(5):1495–7.
Lesko CR, Henderson NC, Varadhan R. Considerations when assessing heterogeneity of treatment effect in patientcentered outcomes research. J Clin Epidemiol. 2018;100:22–31.
Wilkinson J, Arnold KF, Murray EJ, et al. Time to reality check the promises of machine learningpowered precision medicine. Lancet Digit Health. 2020;2(12):e677–80.
Acknowledgements
The authors acknowledge the support of the University of South Carolina Big Data Health Science Center and the Center for Effectiveness Research in Orthopaedics.
Funding
This project was generously funded by a grant from the University of South Carolina Big Data Health Science Center and focused funding from the Center for Effectiveness Research in Orthopaedics.
Author information
Authors and Affiliations
Contributions
JMB created the simulation scenarios in the paper with conceptual and programming guidance from CGC, BKC, SF, and NH. JMB wrote the first draft of the manuscript with BKC, CGC, SF, and NH providing key insightful editorial changes in focus and direction.
Corresponding author
Ethics declarations
Ethics approval approval and consent to participate
This study uses simulated data with no human interaction. As such, this study was designated “exempt” by the University of South Carolina Institutional Review Board under Category 4 of 45 CFR 46.101(2)(b). All methods were carried out in accordance with relevant guidelines and regulations.
Consent for publication
Not applicable. Because this study had no human interaction, informed consent was deemed unnecessary according to national regulations by the University of South Carolina Institutional Review Board.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Brooks, J.M., Chapman, C.G., Chen, B.K. et al. Assessing the properties of patientspecific treatment effect estimates from causal forest algorithms under essential heterogeneity. BMC Med Res Methodol 24, 66 (2024). https://doi.org/10.1186/s12874024021875
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874024021875