National surveillance of stroke quality of care and outcomes by applying post-stratification survey weights on the Get With The Guidelines-Stroke patient registry

Background The U.S. lacks a stroke surveillance system. This study develops a method to transform an existing registry into a nationally representative database to evaluate acute ischemic stroke care quality. Methods Two statistical approaches are used to develop post-stratification weights for the Get With The Guidelines-Stroke registry by anchoring population estimates to the National Inpatient Sample. Post-stratification survey weights are estimated using a raking procedure and Bayesian interpolation methods. Weighting methods are adjusted to limit the dispersion of weights and make reasonable epidemiologic estimates of patient characteristics, quality of hospital care, and clinical outcomes. Standardized differences in national estimates are reported between the two post-stratification methods for anchored and non-anchored patient characteristics to evaluate estimation quality. Primary measures evaluated are patient and hospital characteristics, stroke severity, vital and laboratory measures, disposition, and clinical outcomes at discharge. Results A total of 1,388,296 acute ischemic strokes occurred between 2012 and 2014. Raking and Bayesian estimates of clinical data not available in administrative data are estimated within 5 to 10% of margin for expected values. Median weight for the raking method is 1.386 and the weights at the 99th percentile is 6.881 with a maximum weight of 30.775. Median Bayesian weight is 1.329 and the 99th percentile weights is 11.201 with a maximum weight of 515.689. Conclusions Leveraging existing databases with patient registries to develop post-stratification weights is a reliable approach to estimate acute ischemic stroke epidemiology and monitoring for stroke quality of care nationally. These methods may be applied to other diseases or settings to better monitor population health. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01214-z.

are not available for incident disease and the assessment of healthcare quality [2]. The IOM's report recommends that surveillance systems be created to track progress on cardiovascular burden and inform efforts to reduce disease burden. Since the IOM's publication in 2011, robust disease surveillance systems for cardiovascular disease have not been developed in the U.S. The glaring need to build such a surveillance system continues to be emphasized [2]. Systematically integrating various paper and electronic health record systems across the U.S. remains an insurmountable task. For this study, we sought to overcome these challenges by integrating two existing data sources for future epidemiologic and outcomes research work related to acute ischemic stroke.
A non-representative database may be transformed into a representative one if appropriate post-stratification weights are estimated to rebalance over and under-represented segments of a target population of interest [3]. Statistical methods may be used to post-stratify non-random sample observations and approximate true target population estimates.
In the U.S., the best estimates for the incidence and utilization of hospital services are publicly available through databases sponsored by the Agency for Healthcare Research and Quality's Healthcare Cost and Utilization Project [4]. The National Inpatient Sample (NIS) is a structured random sample of U.S. hospitalizations that is then weighted to represent national hospital utilization. However, the database does not include detailed clinical data such as stroke severity, laboratory data, medical treatments received, and patient reported outcomes. A few community cohort and case-control studies are currently featured in the annual American Heart Association (AHA) statistical update on heart disease and stroke statistics, but are not nationally representative and inadequate to measure stroke burden and quality of care nationally [5][6][7].
The AHA-sponsored Get With The Guidelines Program (GWTG) program includes rich clinical data for quality improvement and research analyses [8]. Yet, registries with volunteer hospitals are not proportionally representative of the entire nation [9,10]. For this study, we implement and validate advanced post-stratification weighting methods and describe the clinical characteristics of the national acute ischemic stroke population using the AHA's GWTG-Stroke registry. Implementation of these methods form a platform for future national surveillance and health care quality research.

Data source
We used the GWTG-Stroke registry from 2012 to 2014 to evaluate post-stratification weighting procedures to represent the entire U.S. acute ischemic stroke (AIS) population. In GWTG-Stroke, trained personnel abstract reliable deidentified demographic, clinical, and event information from participating hospitals using an internetbased patient management tool [8]. Identification of AIS is accurately identified and clinical variables such as admission and discharge stroke severity are systematically included, alongside detailed clinical data not available in administrative claims data alone. GWTG-Stroke includes 1300-1500 hospitals per year and details are previously described [11,12]. Hospitals participating in the GWTG program do so on a voluntary basis. Although the GWTG program contains many small, rural and nonacademic hospitals, these hospital types are underrepresented compared to the overall U.S. hospitalized population [9]. Therefore, the sampling strategy does not directly estimate national AIS clinical characteristics as currently structured.
To determine the total number of AIS hospitalizations in the U.S. and marginal population characteristics for post-stratification weights, target population counts are obtained from the NIS sponsored by the Agency for Healthcare Research and Quality. For 2012 to 2014, the NIS sampled 20% of the administrative discharge records from all participating hospitals (approximately 4300 hospitals) covering 95% of the U.S. population and 94% of all community hospital discharges [13]. While the NIS may be used to understand populations rates of AIS, basic demographics, procedures, and costs, which lacks detailed clinical and outcomes data.

Study population
The target population for the post-stratification weighting procedure is the total AIS presenting to U.S. hospitals by year. The NIS defines the AIS burden nationally stratified between the years of 2012 and 2014 and the 9 U.S. Census regionspreserving the smallest sampling unit recommended by the NIS sponsors.

Data definitions
AIS is defined using the primary discharge diagnosis from the first listed International Classification of Diseases, Ninth Revision (ICD-9) code for each NIS hospitalization [14]. AIS is defined in GWTG-Stroke based on abstracted discharge diagnoses (online supplement, eTable 1). GWTG-Stroke uses electronic case report form-based data extraction from clinical chart review to document patient-specific comorbid conditions. The NIS diagnostic and procedure estimates are based on administrative coding of ICD-9 diagnostic and procedure codes.

Statistical analysis
Two parallel methods are used to estimate poststratification survey weights. Raking is an iterative procedure for minimizing the dispersion of weights for each observation relative to the average sample weight to approximate marginal counts for characteristics of interest. More recent research has advanced Bayesian interpolation statistical methods to estimate post-stratification weights and fit flexible analytic models. Both raking and the Bayesian interpolation method rely on anchoring estimates to a select characteristics shared between disparate datasets in order to correct skewed distributions. For this study, select hospital and patient characteristics are added iteratively as anchoring variables to improve skewed representation within GWTG-Stroke. The two post-stratification epidemiologic estimates regarding AIS care are contrasted.
Standardized differences for all weighted characteristics are estimated for patient and hospital characteristics (anchored and non-anchored variables). We analyze the distribution of raking and Bayesian weights with histograms and treemaps to provide a perspective on the skewed representation of the GWTG-Stroke raw sample. Iterative model development is used to select the minimal set of hospital or patient characteristics necessary to limit extreme post-stratification weights while maintaining reliable population estimates for known NIS estimates.

Overview of the estimation problem
Suppose we want to estimate the proportion of eligible patients for different age categories in the population. For each census division (i.e., sample s) and for the elements k in the census division, i.e., k ∈ s, we observe in the registry a number x k hospitalizations, with some of them possibly under-(or over-) represented relative to the target population. Using data from the available registry, our goal is to estimate the probability sampling weight w k such that where t x is the observed mean for the target population from the NIS [15]. For this study, we derive the poststratification weights w k using two parallel approaches: raking and the Bayesian interpolation.

Raking procedure
Raking procedures are used to generate weights when known marginal counts are available for two or more categorical variable dimensions [16][17][18]. The raking algorithm creates an initial weight for all observations and then iteratively adjusts them to minimize the spread of weights, so no single observation is over-or underrepresented in the data [17]. Therefore, if the target male population is 400,000 and the sample population is 200, 000 males, an initial raking weight of 2 would apply to all observations across male sex. Raking attempts to minimize the difference between new weights and the initial weight to approximate the targeted population totals across multiple anchoring dimensions.
The initial or base weight d k based on the population size, such that d k multiplied by the sample size equals the population size. The goal of a raking procedure is to minimize the sum of the difference between the new weights (w k ) and the base weight (d k ) [15]. Raking attempts to estimate a determined t x target while minimizing the average weight distance from the base weight.
Typically, weighted variance estimation (i.e. the Horvitz-Thomson estimator) of structured data accounts for the inclusion probability of sampled data from a population [16]. Post-stratification variance estimation with raking uses an additive analysis of variance (ANOVA) of the residuals to fit the model [17,19]. Variables available in both GWTG-Stroke and the NIS are selected as anchoring variables to generate the raking weights using SAS 9.4 (SAS Institute, Inc., Cary, North Carolina). Shortcomings of this frequentist approach to probability weight generation remain. Statistical assumptions may not hold for variance estimation, especially for testing interactions and smallarea estimation [20]. This procedure may also create negative weights in certain constrained data situations [21]. Variables evaluated for raking included: age quartiles, sex, race/ethnicity, region, payer, hospital bed size, hospital ownership (government, private non-profit, private investor-owned) and rural/urban status.

Bayesian population interpolation
The Bayesian population interpolation approach frames post-stratification weights as estimated from the posterior distribution of anchoring variables for the target population (i.e. total U.S. AIS population). The Bayesian model allows for greater flexibility and the ability to integrate information from multiple sources that account for the known marginal and joint distributions of various population characteristics over time. For this study, only the NIS is required to calibrate post-stratification weights. The observed proportions from GWTG-Stroke are Bayesian prior information within the model and are non-representative of the target population. The Bayesian model estimates post-stratification weights when integrating prior and posterior information for the anchored variables. The observed GWTG-Stroke dataset (Bayesian prior) when fit to the marginal distribution of the anchoring characteristics generates post-stratification weights [22,23]. The fundamental model is described as such: let p m represent the observed proportion for a given variable m for subgroup with φ m being the true population proportion. Observed counts are represented by the sample size multiplied by the observed proportion (n s p m )).
Next, we build a multinomial observational model for adjusting the observed and known subgroup proportions: where n s represents the size of the sample and n s p m is the number of patients that fall within different subcategories (i.e. m = 1, 2, 3) of the sample of patients (for which the observed numbers are the naïve estimates). The number n r s is the precision of the sampling distribution, which we specify in the application based on n s . Under this model, the expected value of the proportion p m is thus φ m . Finally, for a given cell, φ m = A m π , where π is the true (unknown) cell population and A m is an indicator matrix whose component are equal to 1 when the observed cell is not empty and 0 otherwise.
For each year, the anchoring covariates form joint distributions between the observed GWTG-Stroke observations and target population proportions. The conjugate of the multinomial distribution π τ~D ir(π τ − 1 , n h ) are Dirichlet models linked through a stochastic relationship (represented by the indexes τ) between each GWTG-Stroke observation and the marginal and joint distributions for the target AIS population derived from the NIS [24]. The hyperparameter n h models the degree of pooling across available registries to which we assign a low prior. The Bayesian model includes permutations of all anchored variable combinations as population subgroups. For variable combinations where GWTG-Stroke lacked observations, non-zero cell populations (i.e., related n h ) are used for estimation. We assume a flat prior for the GWTG-Stroke observations to approximate the target population characteristics from the NIS. Once the posteriors of φ m = A m π are calculated, we determine the weights w k as w k = p m , using the equality [1]. All Bayesian analyses are performed in R 3.6.1 (R Foundation, Vienna Austria). Permission for this analysis was granted through the Duke Clinical Research Institute IRB.

Results
A total 1761 hospitals are included in the GWTG-Stroke registry between 2012 and 2014. We excluded hospitals in which hospital characteristics of interest are not fully recorded in the database. The final cohort included 726, 390 patients across 1546 hospitals representing the raw    GWTG-Stroke cohort prior to weighting ( Fig. 1 and Online Supplement eTable 2, 3, 4). Initially, we attempted a parsimonious model to generate the weights using only select hospital characteristics: ownership, rural/teaching, and bed size stratified by Census division. After observing inadequate representation for select race/ethnic minorities, a decision was made to include patient-level race/ethnicity to derive post-stratification weights. Weights are unique for each hospitalization observed in GWTG-Stroke. The final raking and Bayesian post-stratification weight models used hospital characteristics for ownership, rural/urban and teaching status, bed size followed by race/ethnicity at the patient-level.
There were an estimated 1,388,296 AIS hospitalizations between 2012 to 2014 in the U.S. For the raking method, anchored characteristics in the weighted GWTG-Stroke sample matched the exact population totals estimated from the NIS. This is to be expected unless matching two or more marginal characteristics is mathematically prohibitive ( Table 1). The Bayesian method generates population totals with no more than 5-10% variance of the NIS estimates. While the NIS estimates AIS presented to rural hospitals 10.29% of the time, the GWTG-Stroke unweighted representation is 3.49% and after poststratification using Bayesian derived weights is 6.02%, which is 44% lower than expected. Age distributions for both methods are extremely similar. Sex, race/ethnicity, health insurance status, and comorbidities, vital and laboratory measurements, arrival information and hospital characteristics are also similar between the raking and Bayesian methods. Post-stratification estimates stratified by year and U.S. Division are available in the Online Supplement eTable 5 through 7. The NIS does not provide any clinical data such as medication lists, vitals and laboratory measurements, stroke severity and certain discharge disposition data. The NIS definitions for health insurance status did not align with the GWTG definitions, and therefore were not included in the Table 1. In GWTG, there are small differences in the prevalence of comorbidities between the raking and Bayesian weighting methods. NIS comorbidities are based on administrative coding only while GWTG-Stroke is based on chart abstraction. There are minimal differences in summary vital and laboratory measurement, arrival information, baseline medication usage rates, and inpatient outcomes between the two weighting approaches. On admission we note that 49.2% of stroke patients nationally are using antiplatelet medications, 15.5% anticoagulants, 69.1% anti-hypertensives, 43.6% cholesterol lowering medications, 27.4% diabetic medications. With respect to disposition, 47.6% of patients are discharged home 40.2% to transitional care facilities, and 4.6% with hospice-related services.
For the raked post-stratification weights, the median weight is 1.386 and the weights at the 99th percentile is 6.881 with a maximum weight of 30.775 for individual GWTG-Stroke observations (Fig. 2 A and Online Supplement eFigure 1). For the Bayesian post-stratification weights, the median weight is 1.329 and the 99th percentile weights is 11.201 with a maximum weight of 515.689 (Fig. 2 B and Online Supplement eFigure 2).
Color treemaps permit visualization of the strata where larger weights are concentrated for select characteristics (Figs. 3 and 4). Overall, given the lower representation of rural hospitals in GWTG-Stroke, rural hospitals receive weights in the 6 to 8 range using the raking procedure. The Bayesian approach results in mostly smaller weights on average in the rural areas, however post-stratification estimates using the Bayesian method are underestimated with a standard difference of 16% compared to the raking procedure. When looking at the distribution of poststratification weights by race/ethnicity, raking results in average weights in the 6 to 8 range for minorities in the "Other" category. Using the Bayesian method, we observe some more extreme weights for "Other" race/ethnic minorities living in the division 4 and 6.

Discussion
The characteristics and risk factors of patients presenting with stroke nationally are not well understood given the lack of a centralized national surveillance system. Hospital care for AIS is frequently the first and last opportunity to rescue a life and reverse or prevent neurologic disability. Understanding the effectiveness of hospital systems at a national and regional level is needed to insure both consistency and timeliness in the receipt of evidence-based care. We integrate two large data systems to make better population wide clinical estimates of acute ischemic stroke in the U.S. This work demonstrates that methods exist to marry existing databases to make more reliable statistical inferences of population health and health services utilization.
The Greater Cincinnati/Northern Kentucky Stroke Study makes epidemiologic inferences using case ascertainment for an urban population to report stroke incidence rates. The population described is slightly younger, more female, has a higher representation of African-Americans, and higher rates of coronary artery disease and heart failure than is estimated from the NIS or weighted GWTG-Stroke presented ( Table 2) [25][26][27].
The approach described in the present paper is a far more robust estimation of the characteristics of stroke presentation and the quality of hospital care nationally. The GWTG-Stroke patient registry captures 58% of all strokes nationally. By anchoring to the NIS, the median weights are reasonable with a median multiplier of 1.3 and very few extreme or outlier weights. The main challenges the model faced was estimation for small cohorts that are under-represented such as rural populations and other minorities in select regions of the U.S. Overall, we provide one of the best estimations for clinical characteristics expected for the entire U.S. population using GWTG-Stroke with post-stratification survey weights.
For straightforward epidemiologic estimates of clinical data from a patient registry, raking procedures are sufficient and provide good statistical stability and precision. For more complex models where additional data integration or multivariable regression modeling is required, the Bayesian approach allows greater flexibility and more direct specification of the assumptions required for measuring estimands and credible intervals.
As patient registries have expanded, advanced statistical methods are available to transform non-random samples into representative population estimates. This research demonstrated that both traditional and Bayesian methods perform well to reshape unstructured data and make inferences regarding the U.S. population. This is the first study to our knowledge that has transformed a patient registry using post-stratification weights to represent a larger population of interest. The ability to translate observations from large registries to a national scale would fill a considerable void in the surveillance of the clinical characteristics, quality of care, and outcomes for AIS hospitalizations nationally [28].
There are limitations to this work. GWTG-Stroke is a voluntary program for quality improvement. Hospitals that do not participate may be more likely to lack systems for quality improvement and therefore measures of the timeliness or completeness of AIS treatment may be (See figure on previous page.) Fig. 3 Treemaps of weighting stratified by U.S. Census division and rural/teaching hospital status. a, b: The treemaps provide a perspective of population size (box size) across region and hospital characteristic to describe the target population. The average size of the post-stratification weights used for each observation within Get With The Guideline-Stroke using the post-stratification approach. The more yellow and red regions of the treemaps highlight under-represented populations that required larger relative weights to model the target national population biased in a favorable direction. Coding accuracy of comorbid conditions remains an issue for both administrative data from the NIS and abstracted from inpatients charts in GWTG-Stroke. Large post-stratification weights are applied to under-represented patient populations such as those in rural areas and race/ethnic minorities. Applying these methods to smaller sizes may generate less reliable estimates and may not adequately capture the diversity in patient populations. Given there is no gold standard to compare certain statistics we estimated for the U.S. AIS population, we cannot reliably test any biases that might have arisen based on the two approaches used to generate post-stratification weights. These weights are generated retrospectively, but the same methods will allow for prospective post-stratification and continuous calibration with changes in secular trends of both stroke presentation and GWTG-Stroke center participation.

Conclusion
As healthcare in the U.S. is decentralized, there are immense practical and financial obstacles to building national or regional AIS surveillance systems. Leveraging existing patient registries such as GWTG-Stroke and applying post-stratification weights to reshape unstructured data is an efficient means of providing population surveillance of clinical measurements and outcomes not easily measured otherwise. Both raking and Bayesian approaches provide reasonably accurate estimates for describing health service utilization and the quality of care from a national perspective. We have provided a demonstration for how future researchers may approach non-survey data to achieve better representation of target population of interest. Both the raking and Bayesian interpolation methods of generating post-stratification weights may be applied to more advanced statistical modeling approaches to improve population wide inference and the surveillance of health care quality and outcomes.