Assessing regression to the mean effects in health care initiatives

Background Interventions targeting individuals classified as “high-risk” have become common-place in health care. High-risk may represent outlier values on utilization, cost, or clinical measures. Typically, such individuals are invited to participate in an intervention intended to reduce their level of risk, and after a period of time, a follow-up measurement is taken. However, individuals initially identified by their outlier values will likely have lower values on re-measurement in the absence of an intervention. This statistical phenomenon is known as “regression to the mean” (RTM) and often leads to an inaccurate conclusion that the intervention caused the effect. Concerns about RTM are rarely raised in connection with most health care interventions, and it is uncommon to find evaluators who estimate its effect. This may be due to lack of awareness, cognitive biases that may cause people to systematically misinterpret RTM effects by creating (erroneous) explanations to account for it, or by design. Methods In this paper, the author fully describes the RTM phenomenon, and tests the accuracy of the traditional approach in calculating RTM assuming normality, using normally distributed data from a Monte Carlo simulation and skewed data from a control group in a pre-post evaluation of a health intervention. Confidence intervals are generated around the traditional RTM calculation to provide more insight into the potential magnitude of the bias introduced by RTM. Finally, suggestions are offered for designing interventions and evaluations to mitigate the effects of RTM. Results On multivariate normal data, the calculated RTM estimates are identical to true estimates. As expected, when using skewed data the calculated method underestimated the true RTM effect. Confidence intervals provide helpful guidance on the magnitude of the RTM effect. Conclusion Decision-makers should always consider RTM to be a viable explanation of the observed change in an outcome in a pre-post study, and evaluators of health care initiatives should always take the appropriate steps to estimate the magnitude of the effect and control for it when possible. Regardless of the cause, failure to address RTM may result in wasteful pursuit of ineffective interventions, both at the organizational level and at the policy level.


Background
Interventions targeting individuals classified as "highrisk" have become common-place in the health care industry. High-risk may capture anything from high utilization or cost of health services, to outlier values on clinical measures (e.g., blood glucose, blood pressure, cholesterol). Typically, such individuals are invited to participate in an intervention intended to reduce their level of risk, and after a period of time, a follow-up measurement is taken. The pre-test to post-test change in the outcome is then generally presented as the impact of the intervention. This evaluation approach is problematic from a statistical standpoint because individuals initially identified by their high values will likely have lower values on remeasurement in the absence of an intervention. This statistical phenomenon is known as "regression to the mean" (RTM) and often leads to an inaccurate conclusion that the intervention resulted in a treatment effect [1].
However, RTM is rarely addressed when evaluating health care delivery interventions or in the more general decision making processes in health care [22]. This is despite its increasing relevance given the intensified focus on high-cost, and high-need groups, and efforts to design programs specifically targeting them. There are at least three possible explanations for why this may be. First, evaluations of delivery-side interventions are traditionally not subject to the same rigor as medical interventions (i.e., RCTs), and with this may come a lack of awareness of the need to address RTM. Second, it has been shown that cognitive biases may cause people to subconsciously systematically misinterpret RTM effects as intervention effects by creating (erroneous) explanations to account for it [23]. Third, there are more blatant examples in which organizations have a stake in the outcome of the intervention and capitalize on the RTM effect as a business strategy. For example, commercial disease management organizations have long advocated that their programs be evaluated without a control group despite the recognition that the intervention group will demonstrate better outcomes due to regression to the mean [24]. Regardless of the cause, failure to address RTM may result in wasteful pursuit of ineffective interventions, both at the organizational level and at the policy level.
In this paper, we seek to provide researchers, organizational decision-makers, and policy-makers, with a broader set of tools to understand and assess RTM effects. First, real examples of RTM in health care are presented to illustrate the phenomenon. Next, the traditional method for calculating the RTM effect in normally distributed data is described, and these RTM effect estimates are compared with RTM effects generated from Monte Carlo simulation of normally distributed data. Next, these comparisons are repeated using skewed data from a control group in a health coaching study to illustrate the shortcomings of the traditional approach to accurately estimate the RTM effect, in the common scenario of non-normal data. We use this to motivate the primary contribution of the paper, the estimation of standard errors and confidence intervals around the RTM effect. While largely absent from existing RTM literature, including confidence intervals, a measure of the precision of single-value RTM estimates, is valuable because it provides a range of values that are considered to be plausible for the population. Finally, the advantage of calculating confidence intervals around RTM estimates is discussed in detail, and approaches for designing health care interventions to mitigate, or at least account for, the effects of RTM are provided.

Methods
The regression to the mean concept Regression to the mean was first described over a century ago by Francis Galton (later Sir Francis) upon discovering that, on average, tall parents have children shorter than themselves and short parents have taller children than themselves [25]. RTM is the result of both random measurement error and extremity of scores from the mean [26]. A simple example of this occurs in measuring blood pressure or heart rate. Rarely are any two observations identical, even if taken minutes apart, due to natural biologic variability or measurement error. At the individual level this is called within-subject variability. Additionally, the more extreme the initial value, the greater the expected change will be in the follow-up score. However, over the course of many repeated observations, this variability narrows around the true mean [27,28]. Similar to individual level measures, groups with high (or low) initial mean values will also tend to regress to the mean of the overall sample.
In the context of an intervention, RTM can easily be mistaken for a program effect in the absence of an equivalent comparison group. The best approaches to illustrate the RTM phenomenon are either by using observations taken from time periods in which no interventions were implemented, or by using control group data derived from a research study. Figure 1 illustrates the first approach by displaying the average costs for the highest and lowest quintile of a continuously enrolled cohort of chronically ill health plan members over the course of two years during which no chronic disease interventions were in place [29]. Each cohort -coronary artery disease (CAD), congestive heart failure (CHF), and chronic obstructive pulmonary disease (COPD) -exhibits a similar RTM pattern. In the first year, the highest quintile average costs range from approximately $20,000 to $27,000 across the three conditions. In the second year, the average costs in these groups drop to a range of approximately $7,000 to $10,000. Conversely, all three cohorts in the lowest quintile of costs (less than $300) in the first year increased to between $4,700 and $8,000 in the second year. The diagonal line is the expected trend line had there been perfect correlation between the first and second measurements (i.e., no variability between measurements, no measurement error, and thus no RTM). This scenario clearly illustrates RTM. Had a disease management program targeting high-cost CAD, CHF, or COPD patients taken place during this period, an evaluation of the impact on costs would have wrongly attributed these reductions to a program effect.
Using data from a control group also illustrates RTM. Figure 2 presents physical component summary (PCS) scores (with bootstrapped 95% confidence intervals) from the SF-12 health status survey [30] for a control group (n = 118) from a study conducted at a large organization in the Northwest [31]. Control group members were surveyed twice, once at program commencement and then again at three months and received no intervention. Scale values are standardized from 0 to 100, with higher values indicating better physical health. To illustrate RTM, if a high-risk group is classified as having a PCS score in the first period of less than 44.25, which corresponds to the 25 th percentile at the U.S. national level [32], RTM is evident in the fact that their mean PCS scores significantly increases (no overlap in the pre-and post-measurement confidence intervals), by over 8 points (22.6%), in the second period while the lower-risk group (those in the 26 th -100 th percentile) remained unchanged (because the mean value of this group was already close to the overall mean of the entire sample, there was nowhere to regress to).
The examples presented in this section use measures that commonly serve as outcomes in health care interventions, and both cases clearly illustrate RTM. This suggests that there are likely many contexts in which RTM, and not a program effect, explains an observed change from initial outlier status to follow-up values closer to the overall mean.
Classic formulae for estimating the magnitude of RTM Estimation of the RTM effect for normally distributed data can be conducted when, at a minimum, the following four parameters are known: the population mean of the pre-test (μ), the population variance of the pre-test (σ 2 ), the correlation between the pre-test and post-test (ρ), and the cutoff score representing the high-risk group (κ). The expected RTM effect is [27,33,34]: where y 2 is the within-subject variance (σ 2 − δ 2 ), δ 2 is the between-subject variance (ρσ 2 ), and thus (δ 2 + γ 2 ) is the pooled variance (when the square root is taken, this becomes the pooled standard deviation). C(z) is calculated iteratively, beginning with the z-score: whenever high-risk is indicated by values above κ, and whenever high-risk is indicated by values below κ. As before, μ is the baseline population mean, σ is the participating in a health coaching study (Butterworth et al. 2006). All participants were surveyed twice, once at program commencement and then again at three months. Squares/circles represent mean scores and capped lines represent 95% bootstrapped confidence intervals (1000 resamples).  Figure 1 Actual data illustrating the regression to the mean phenomenon in Coronary Artery Disease (CAD), Congestive Heart Failure (CHF), and Chronic Obstructive Pulmonary Disease (COPD). Quintile I is the lowest cost group and V the highest. All individuals were continuously enrolled during the 2-year period. The diagonal line represents perfect correlation between the first and second year costs, which can only be achieved in the complete absence of variability between measurements and no measurement error.
standard deviation of the entire pre-test sample, and κ is the cutoff score. C is calculated as: where Φ(z) is the probability density function and Φ(z) is the cumulative distribution function for z in a standard normal distribution. The expected mean values for both pre-test and post-test in the high-risk group can also be calculated as follows: Expected post−test mean high−risk where values are added whenever high-risk is indicated by values above κ, and subtracted whenever high-risk is indicated by values below κ. Subtracting the expected pre-test mean (Equation 5) from the expected post-test mean (Equation 6) should elicit the same expected RTM effect as that derived in Equation 1 (as will Cσ(1-ρ)).

Testing the performance of the RTM formulae
We examine the performance of the RTM equation (Equation 1) in estimating the RTM effect using two approaches. First a Monte Carlo simulation study is conducted assuming medical cost as the outcome, as it is often a primary focus of health services research. Following the simulation, the performance of the RTM formulae is demonstrated using actual data (the PCS data described in the previous section).

Design of the Monte Carlo simulation
An "actual" RTM effect is generated by drawing two variables from a multivariate normal distribution to represent the pre-test and post-test costs for a pseudo-population of 10,000 observations, with means of $5,000, standard deviations of $1,350, and three pretest-posttest correlations: 0.25, 0.50 and 0.75. The minimum value of the highest pre-test quintile of cost is set as the cutoff (≈ $6,136), with values above and below that level categorized as "high-risk" and "low-risk", respectively. The mean difference in pretestposttest costs for the two risk tiers represents the "actual" RTM effects. We compare this with the "calculated" RTM effect for the high-and low-risk groups using Equation 1 with the same cutoff value (≈ $6,136). This process is repeated 10,000 times for each of the three correlation levels and the actual versus calculated RTM effects are reported for the low and high-risk groups. The simulation was conducted in Stata 12.1 (StataCorp, College Station, TX), using the built in simulate command, and rtmci, a command written by the author (available upon request).

Design of the empirical example
Here, the PCS data for the 118 controls [31] described earlier in the current paper and illustrated in Figure 1 are revisited, in order to demonstrate the performance of the RTM formulae when data are skewed (p<0.00001 for the Shapiro-Wilk W test). The pre-test mean, post-test mean, and mean difference in pretest-posttest PCS scores are used to generate the "actual" RTM effects and the "calculated" comparisons are again computed using Equations 1, 5 and 6 and rtmci in Stata. The differences between the actual and calculated values are then compared, and for all estimates, 95% confidence intervals are computed via bootstrap simulation, e.g., by resampling 1000 observations (with replacement) from the actual data. Notes: RTM (H) is the regression to the mean effect for the high-risk group, and RTM (L) is the regression to the mean effect for the low-risk group. "Actual" represents the RTM effect derived directly from the data, and "calculated" is derived using Equation 1. ρ is the pretest-posttest correlation for the entire sample.

Empirical data
The summary statistics of the PCS data are as follows: pre-test overall sample mean = 53.12, pre-test overall sample standard deviation = 8.27, pretest-posttest correlation = 0.742, and the cutoff = 44.25. Table 2 provides results for the high-risk group (PCS values ≤ 44.25, n=34) As shown, the actual pre-test mean is 3.38 points lower than that derived by the calculated method and the actual post-test mean is 1.52 points higher than that derived by the calculated method. As a result, the actual RTM effect is 8.28 points, which is much higher than the calculated method that produces a point estimate of 3.38. The difference between these two estimates is 4.90 points with a confidence interval of 1.12 to 8.68 points.

Discussion
The results of the simulation study demonstrate that the formulae for estimating RTM effects [27,33] accurately calculate RTM when the data are normally distributed. By extension, these results support the use of RTM analysis in pre-post observational studies as a means of estimating the RTM effect. However the results using these skewed data, suggests the RTM calculation significantly under-estimated the true RTM effect by between 1.12 and 8.68 points. Generally, when researchers seek to calculate the RTM effect using skewed data, transforming the data to make them normally distributed before using the traditional formulae may suffice. However, if transforming data to another scale may lead to a loss of interpretability (as would be the case with SF-12 data), performing the calculations on the original scale and calculating confidence intervals that reveal the magnitude of the error offers an alternative approach that may be more useful. In our example, the confidence interval for the calculated RTM effect was 1.54 to 5.22, which overlaps with the actual RTM confidence interval of 4.35 to 12.21. Thus, the confidence interval for the calculated RTM effect provides a range of values more closely aligned with the true effect than the point estimate alone. A third option is to consider models devised to estimate regression to the mean effects in non-normally distributed data [35,36]. However, some of these approaches rely on non-parametric modeling approaches, such as kernel density estimators [36], and are sensitive to the choice of bandwidth. Thus, the various approaches to estimating RTM may likely elicit different estimates depending on which methods are employed, even within the same data-set. Here again, the addition of confidence intervals can provide assistance to the evaluator in determining the overlap in estimates derived among the various methods.

Designing interventions to mitigate the RTM effect
While earlier sections focused on illustrating RTM and offering suggestions for how to estimate the magnitude of the RTM effect, ideally studies are designed to mitigate the effect of RTM. The randomized-controlled trial (RCT) is the obvious study design to control for RTM because randomly assigned groups should be equally affected (i.e., the treatment effect is the net effect after eliminating any RTM). The regression-discontinuity (RD) design should be considered as a viable alternative when randomization is not possible [37,38]. The RD design relies on a cut-off point on a continuous preintervention variable to assign individuals to treatment. The individuals just to the right and left of the cutoff are assumed to be exchangeable -as in a randomized trial. Because individuals do not have precise control over their assignment score (nor would they know where the cutoff lies), they cannot self-select into treatment. Thus, we would expect a similar RTM effect for both groups in the neighborhood of the cutoff. A third approach to mitigating the RTM effect in the design stage of an intervention is to base the treatment assignment on the cut-off, conditioned on the mean of multiple pre-tests rather than just a single pre-test [27,33,39,40]. This has the effect of stabilizing the mean and reducing within-subject variability. When multiple pre-test measurements are used, the previously described equations require minor modification [27,28,33]. In Equation 1, the within-subject variability y 2 (in both the numerator and denominator) is now divided by the number of pre-tests n from which the mean is derived, becoming y 2 /n. In all other equations, σ is now replaced Notes: "Actual" indicates that the pre-test, post-test and RTM effects were estimated directly from the data. "Calculated" indicates that Equations 1, 5 and 6 were applied directly to the existing data. Standard errors and 95% confidence intervals were derived by bootstrap resampling of the data 1000 times.

Controlling for RTM through data analysis
When only retrospective observational data are available, several approaches may be considered to control for RTM. Matching techniques [41] allow the investigator to try to replicate the randomization process by creating a control group that is essentially equivalent to the treatment group on observed pre-intervention characteristicsespecially on the pre-test variable that we are most concerned leads to RTM. One particular advantage of matching techniques over other covariate adjustment strategies (e.g., multiple regression models), is that the investigator can directly assess how well the pre-test variable overlaps in its distribution between groups using graphical or numerical diagnostics [41]. A high degree of overlap in the distribution increases our confidence that the RTM is effectively controlled for, as we would expect in an RCT. The most common analytic approach appearing in the literature to adjust for RTM is by analysis of covariance (ANCOVA). This approach controls for the baseline level of the pre-test by including the pre-test as a covariate in the model. Additionally, an RTM "correction factor" [42][43][44] can be applied to each person's pre-test score and that adjusted pre-test score can be used in the ANCOVA. For example, Trochim [44] adjusts an individual's pre-test score as follows: where x is the treatment group mean, ρ is the pre-post correlation for that treatment group, and x is the individual's pre-test value. It is important to keep in mind, however, that when using ANCOVA (with or without the corrected pre-test), model assumptions, such as linearity between outcome and covariates, must be tested. Moreover, contrary to matching strategies where covariate balance can be directly assessed, in ANCOVA models, there is no assurance that the treatment groups are comparable on all baseline covariates. In fact, it is imperative that decision-makers consider other potential sources of bias (in addition to RTM) that may masquerade as a treatment effect. This is particularly true when using observational data, since it is likely that participants and non-participants will differ on several characteristics (e.g., health behaviors) not often available in claims analysis [45]. Finally, perhaps the easiest approach for adjusting outcomes to control for RTM effects is simply to subtract the calculated RTM effect derived from Equation 5 from the overall treatment effect estimate [34]. Moreover, with the additional availability of confidence intervals, the investigator can provide a range of "net" treatment effect estimates when data are skewed.

Conclusion
In this paper we have illustrated that health care interventions are susceptible to the effects of RTM when individuals are chosen to participate in the intervention based on their outlier baseline "risk" score, and there is large withinsubject variability or measurement error. When estimating the RTM effect on normally distributed data the calculated estimates produce identical results to those of simulated data. However, the equations underestimated the RTM effect in right-skewed data. We described several approaches for investigators to consider as methods to adjust for RTM, depending on the degree of control they have over the intervention and evaluation designs. However, designing interventions to mitigate the effects of RTM is a preferred strategy to retrospectively estimating the extent to which RTM may explain any observed treatment effect. Most importantly, both evaluators and stakeholders should be aware of RTM as a major source of bias in intervention studies, and take the appropriate steps to estimating its effect and controlling for it whenever possible to ensure valid conclusions about program effectiveness.