New design
By proposing the present hemoglobin-based design, we aim to reconcile observational studies and RCTs investigating transfusion safety. There are three core components of the new design that distinguish it from those of existing studies.
-
(1)
Re-definition of the study population.
It is very common for transfusion studies (in both previous observational studies and RCTs) to limit the study population to a specific surgical procedure or disease [7, 21], in addition to other inclusion/exclusion criteria, such as age limits. Rather than following this common practice, we redefined the study population according to a stable hemoglobin range. “Stable” here means no active bleeding, and the range of hemoglobin concentration is determined by the transfusion threshold of interest, e.g., 7.5–9.5 g/dL in our example study. Using these criteria, we focused on the transfusion effect for patients with anemia that are within a gray zone of transfusion decision, according to current guidelines [22]. Data of any patients within this defined range who also meet the study-specific inclusion criteria could be subsequently analyzed.
Key point 1
An underlying assumption of the hemoglobin-based definition is that the included patients are a homogeneous population regarding the decision for transfusion. The reason is that these patients have a similar level of anemia, despite different surgical categories or other patient-specific conditions. The new definition of study population largely retains the authenticity of real-world data in representing patients with anemia seen in daily practice and accords our design the potential to augment external validity beyond than that of RCTs, which typically have a very narrow patient spectrum.
Key point 2
The new definition of study population naturally excludes patients with active bleeding and severe anemia, two strong indications for which non-transfusion is unexplained (but that exist in real-world practice); the new definition excludes unreasonable transfusion beyond the current clinical standard (hemoglobin ≥10 g/dL) [22]. These properties largely avoid bias by indication, which is present in observational studies that attempt to associate transfusion with outcomes using any available sample.
-
(2)
Selection of comparison groups.
In our new design, we selected two comparison groups: a transfused group (exposure) and non-transfused group (control), both defined according to the hemoglobin concentration of interest, hereafter referred to as the trigger value. The trigger value is a very important component of the RCT design. For instance, the decision to transfuse a patient is made based on whether the hemoglobin level is below 10 g/dL for the liberal transfusion arm and below 8 g/dL for the restrictive transfusion arm [23]. Similarly, in observational studies, this decision is also primarily based on the hemoglobin level [24]. Because patients can have multiple hemoglobin tests and multiple transfusions in practice, we defined the trigger value as the last measurement before the initial transfusion in the exposure group and the nadir during the hospital stay in the control group. The purpose of these choices is to unify the decision to transfuse (or not transfuse) a patient according to the same decision criterion, namely, the degree of anemia in the patient.
Key point 3
The comparison of liberal versus restrictive transfusion strategies, in essence, compares the effect of transfusion to that of no transfusion when the anemia level of a patient is within the critical range of the transfusion decision, that is, the hemoglobin range defined by the low and high thresholds (see Fig. 1). By targeting the critical range, our new design can approximate the experimental design and the study efficiency is greater in terms of outcome comparison because in the experimental design, the interventions are identical for liberal and restrictive strategies beyond this critical range [3].
-
(3)
Dealing with patient heterogeneity.
Unlike RCTs where the only apparent difference between randomized patient groups is the pre-specified transfusion protocols (liberal or restrictive), patient heterogeneity (i.e., skewness in baseline characteristics between comparison groups) is high in observational studies and is usually not fully recognized nor treated, thereby undermining the validity of effect estimation. To quantify heterogeneity between the comparison groups with observed information, we propose using a uniform measurement, the standardized mean difference (SMD) [25], defined as:
$$\left\{\begin{array}{c} SMD=\frac{\left|{\overline{x}}_1-{\overline{x}}_2\right|}{\sqrt{\left({s}_1^2+{s}_2^2\right)/2}}\times 100\%,\kern0.5em \textrm{for}\ \textrm{continous}\ \textrm{variables}\\ {} SMD=\frac{\left|{p}_1-{p}_2\right|}{\sqrt{\left({p}_1\left(1-{p}_1\right)+{p}_2\left(1-{p}_2\right)\right)/2}}\times 100\%,\kern0.5em \textrm{for}\ \textrm{binary}\ \textrm{variables}\end{array}\right.$$
(1)
where \(\overline{x}\), s2, and p are the mean, variance, and proportion in a comparison group, respectively. Commonly, an SMD value smaller than 10% is suggestive of minor differences between comparison groups.
By investigating apparent, moderate, and minor sources of patient heterogeneity, pertinent approaches such as restriction, stratification, or other statistical methods can then be used to address bias by indication (e.g., severe anemia, bleeding) or other common confounding factors (e.g., age, comorbidity). A continuous effort to monitor and reduce patient heterogeneity in different analytic datasets can help to improve the validity of the estimated transfusion effect, as illustrated in the example below.
Data source and patient selection
We used real-world data prospectively collected in a multicenter quality improvement project conducted during 2015 to 2016 at four academic/teaching hospitals that represent the regional diversity of China [26]. We focused on hospitalized older patients (aged 60 years and over) undergoing general surgery. This surgical population underwent different categories of procedures (mainly including intestine, gallbladder, thyroid, and stomach surgeries), comprising 51% of the non-orthopedic, non-cardiac surgery volume, and accounting for 55% of red blood cell transfusion among non-orthopedic, non-cardiac surgical patients.
We selected a “base population” to simulate previous observational studies; from this, we further derived a “study population” to demonstrate the effect of the hemoglobin-based design. The criteria for the base population were: (1) major surgery, defined as requiring the presence of an anesthesiologist during surgery; and (2) hospital stay ≥24 hours. In addition to these criteria, the study population was defined as: (3) no bleeding ≥500 mL; and (4) within a hemoglobin range of 7.5–9.5 g/dL. The choice of hemoglobin thresholds for defining the critical range was based on a planned RCT on liberal versus restrictive transfusion among older non-cardiac surgical patients because no pertinent evidence for general surgery patients is available. Patients residing at altitudes of 2000–5000 m above sea level were excluded because the common transfusion threshold may not be applicable. Ethical approval was obtained from the institutional review board of Peking Union Medical College Hospital (approval no.: S-574); requirement for written informed consent was waived because individual information was analyzed anonymously.
Study variables
For comparability, we selected patient outcomes that are similar to those used in previous observational studies and RCTs [27, 28]; these outcomes were death (in-hospital or within 30 days of discharge) and in-hospital complications, including ischemic events (myocardial infarction, stroke, and acute renal failure); infection (surgical site infection, pneumonia, sepsis, septic shock, and urinary tract infection); and others (cardiac arrest requiring cardiopulmonary resuscitation, heart failure, reintubation, mechanical ventilation for ≥48 hours post-operatively, atelectasis, respiratory failure, wound dehiscence, delayed incision healing, pulmonary embolism, venous thrombosis, and multiple organ dysfunction syndrome). These outcomes are considered to be directly or indirectly associated with anemia. We defined the primary study outcome as a composite of these outcomes to stabilize these low-incident events, i.e., using a binary variable that indicates the occurrence of any of these adverse events.
Transfusion information was obtained directly from clinical blood bank systems. We also considered basic patient information (age, sex, smoking status, body mass index), preoperative comorbidity (hypertension, coronary heart disease, diabetes, stroke, chronic obstructive pulmonary disease), preoperative laboratory test findings (low albumin, high creatinine, and high white blood cell count), and physical status (evaluated by an anesthesiologist and recorded using American Society of Anesthesiologists [ASA] score), intraoperative features (operation time, high blood loss), and postoperative return (intensive care unit [ICU] or other). The data collection methods were standardized according to a study protocol reported elsewhere [26].
Statistical analysis
To demonstrate the effect of our new design, we analyzed and presented results for four datasets: the base population before (Base Match−) and after (Base Match+) propensity-score matching; and the study population before (Study Match−) and after (Study Match+) propensity-score matching. Propensity scores were calculated with a multivariable logistic regression model using several key covariates, namely, ASA score ≥ 3, preoperative hemoglobin, operation time ≥ 3 hours, blood loss ≥500 mL, and ICU admission for the base population, and ASA score ≥ 3, preoperative hemoglobin, operation time ≥ 3 hours, and ICU admission for the study population; these variables were clinically and/or statistically significantly related to both transfusion and patient outcomes. Matching was based on a 1:1 ratio using the nearest-neighbor method [29]. A caliper of 0.2 standard deviations of the propensity score was used; the choice of caliper value and the selection of key covariates were made considering both the matching rate and balance of covariates between groups [30].
We treated the propensity score (i.e., the estimated individual probability of receiving transfusion) as an overall index of patient heterogeneity and presented the overlapping range between groups using box plots. To closely examine patient heterogeneity, we quantified between-group differences regarding specific variables using the SMD measurement. The study effect of interest (i.e., the transfusion–outcome association) was quantified using odds ratio (OR) estimated in multivariable logistic regression, adjusting for covariates that were significantly related to patient outcomes, namely, ASA score ≥ 3, age ≥ 75 years, preoperative comorbidity, preoperative hemoglobin, operation time ≥ 3 hours, blood loss ≥500 mL, and ICU admission for the base population and ASA score ≥ 3, age ≥ 75 years, preoperative comorbidity, preoperative hemoglobin, and operation time ≥ 3 hours for the study population.
No imputation of missing data was performed because the missing rates are negligible (highest for operation time: 6.86%). A two-tailed p value of < 0.05 was considered statistically significant. All analyses were performed using SAS software, version 9.4 (SAS Institute, Cary, NC, USA) and R, version 3.6.3 (The R Foundation for Statistical Computing, Vienna, Austria). Plots were drawn with Python, 3.10.2 (Python Software Foundation, Beaverton, OR, USA).