National weighting of data from the Behavioral Risk Factor Surveillance System (BRFSS)

Background The Behavioral Risk Factor Surveillance System (BRFSS) is a network of health-related telephone surveys--conducted by all 50 states, the District of Columbia, and participating US territories—that receive technical assistance from CDC. Data users often aggregate BRFSS state samples for national estimates without accounting for state-level sampling, a practice that could introduce bias because the weighted distributions of the state samples do not always adhere to national demographic distributions. Methods This article examines six methods of reweighting, which are then compared with key health indicator estimates from the National Health Interview Survey (NHIS) based on 2013 data. Results Compared to the usual stacking approach, all of the six new methods reduce the variance of weights and design effect at the national level, and some also reduce the estimated bias. This article also provides a comparison of the methods based on the variances induced by unequal weighting as well as the bias reduction induced by raking at the national level, and recommends a preferred method. Conclusions The new method leads to weighted distributions that more accurately reproduce national demographic characteristics. While the empirical results for key estimates were limited to a few health indicators, they also suggest reduction in potential bias and mean squared error. To the extent that survey outcomes are associated with these demographic characteristics, matching the national distributions will reduce bias in estimates of these outcomes at the national level.


Background
The Behavioral Risk Factor Surveillance System (BRFSS) is a network of health-related telephone surveysconducted by all 50 states, the District of Columbia, and participating US territories-that receive technical assistance from CDC [1]. Annually, in the national aggregate, the BRFSS exceeds 400,000 interviews, with questions focusing on health-related risk behaviors, chronic health conditions, and use of preventive services. Each state samples from adults (aged 18 and older) living in private residences using an overlapping, dual frame landline and cell phone sample.
The BRFSS includes a core standardized questionnaire with optional modules of set questions that states may adopt according to their needs [1]. CDC provides guidance to data users on the appropriate weights to use if variables in analyses are taken from modules used by some of the states or taken from split samples. BRFSS data users often aggregate the state samples from the core questionnaire to use as a national databasewithout accounting for the state-level sampling of the data. Currently, CDC provides no additional guidance to BRFSS data users on how to adjust the weights provided for each individual state sample when they try to aggregate the state samples. As a result, these data users could introduce bias because the weighted distributions of the state samples do not always adhere to national demographic distributions. This article describes the statistical methodology we developed to compute national weights, as well as weighted national estimates and variance estimates, using BRFSS data aggregated across states.
The BRFSS currently uses a fully overlapping sample of landline and cell phone numbers. Currently, states must complete 35% of all interviews by cell phone, although some states interview as much as 65% of their samples by cell phone. States adopt a standard calling protocol each year [1]. States determine a sample design by constructing one or more sub-state regions from which strata will be taken. Given the ability to determine location from landline phone numbers, allocation of landline numbers to strata is a relatively straightforward process. Landline samples also adopt an additional stratification. In this method, known as disproportionate stratified sampling or DSS, telephone numbers are classified into areas of high or medium residential strata. Numbers are taken from the strata at a ratio of 1.5:1, respectively, in order to increase sample efficiency. Landline interviews also include within-household sampling, since phones are generally shared among adults within the home.
Locations for cell phone numbers are more difficult to pinpoint. Some information on geostrata can be obtained from samples drawn from rate centers or billing information. In other cases, locational information is derived from respondents themselves, when asked about county and zip code. If a person has moved from one state to another and retained a cell phone number, the respondent is interviewed and data are then transferred to the state where the respondent actually resides. A cell phone respondent with a Georgia phone number prefix who actually lives in Tennessee, might therefore be interviewed by Georgia but have his/her data transferred to Tennessee after the interview was completed [1].
Once data are collected, CDC provides technical assistance to the states by weighting the data with a method called raking. The margins used for raking are the same for each state, although categories may be collapsed differently for some margins in different states. Weighting variables include age, race, sex, education, ethnicity, marital status, home ownership, sub-state region, and phone ownership (landline only, cell phone only, or dual user). CDC also assists states with data cleaning and data-quality reporting and releases a public-use data set. In 2011, the BRFSS moved from a simpler post-stratification process to raking [2] and strengthened its standardized protocols to allow for the inclusion of cell-phone interviews.
Users may take national estimates of health-related outcomes from a number of national health-data sources, such as the National Health Interview Survey (NHIS), the National Health and Nutrition Examination Survey (NHANES), and the National Survey on Drug Use and Health (NSDUH)-all of which provide estimates on topics also found in the BRFSS. State-level estimates of BRFSS are useful for many different types of research, but many data users also need to generate national estimates from BRFSS-which often is the only provider (or one of a limited number of providers) of health indicator data, or with a much larger number of respondents than other surveys (see Table 1). For example, the NHIS includes a number of items on food security including skipping meals, concern about having enough food, and not eating balanced meals [3], while the BRFSS includes specific items on what individual respondents have eaten [1] with a large enough sample to provide information that can be broken down by demographic subgroups. For these and other reasons, researchers might select the BRFSS when producing national estimates. Example prevalence estimates that have been published based on BRFSS data aggregated nationally include estimates for conditions such as obesity [4][5][6][7], asthma [8], flu vaccination [9], hypertension [10], and diabetes [11]. Further, nationally aggregated BRFSS data have also been used to estimate the percentage of US adults keeping a firearm at home [12] and those following recommendations regarding physical activity [13] and muscle strengthening [14]. This list is not intended to be comprehensive. Khalil and Crawford [15] identified 1,387 articles using BRFSS data from 1984 through 2012, and noted that in the last 10 years, publications focused on national data were most frequent.
The development of national weights-as well as a methodology for computing the associated variance estimates-is warranted, given the variation in sampling at the state level, and the use of aggregated BRFSS data by many authors. The general methodology presented here, to apply a national weight to the state BRFSS samples, was first developed more than a decade ago [16] based on traditional methods for stratified random sampling [17]. The new methods are more powerful as they draw upon the common sampling and weighting (raking) methodology now used by all states. This article also provides a comparison of the methods based on the variances induced by unequal weighting as well as the bias reduction induced by raking at the national level, and recommends a preferred method. Combining the BRFSS state-level survey data into a national data set is a necessary initiative for the following reasons: ■ The system's surveys use the same basic sampling methodology across states; ■ These surveys produce state-level weights using the same basic methodology; ■ The surveys use the same core questionnaire across states; ■ BRFSS currently provides technical assistance to data users on a number of other analyses.
In 2011, the adoption of a raking methodology for post-stratification weight adjustments across all states strengthened the foundation for the development of a statistically valid national weighting methodology. A general overview of raking and its applications in combination with trimming is provided in Iachan [18] and in Battaglia, Frankel and Link [19]; the method adopted by the BRFSS is described in CDC's documentation [1].

Methods
This paper examines alternative approaches for generating national weights. The data file used in these analyses was the 2013 BRFSS public-use data file. These approaches all begin with the state-level weights now computed in the BRFSS system. The baseline method for our comparisons is a simple method that concatenates the data with the current state-level weights. Among the several limitations of this simple method, perhaps the most important is that the weighted distribution across key demographics does not necessarily match known national demographic distributions. To the extent that survey outcomes are associated with these demographic characteristics, matching the national distributions may reduce bias in estimates of these outcomes at the national level. The current BRFSS state-level weighting methodology includes a raking process, an iterative form of poststratification that ensures that weights sum to known population totals for key demographics in each state. Some (but not all) of the new methods developed for national weighting involve an additional layer for the raking that adds the state as a margin. This step ensures that using the national weights at the state level will reproduce the usual state estimate, for every state and every estimate.
An assessment of the weights considers estimated bias and variances, as well as the mean squared error (MSE) for key health risk indicators. While a direct measure of bias is available for key demographic variables, an indirect or estimated bias is necessary for other variables including health outcomes. We compare the national estimates with a benchmark provided by the National Health Interview Survey (NHIS) data for comparable health indicators. The NHIS was chosen as a standard because it provides both the largest sample and a questionnaire that is similar to the BRFSS. NHIS also provides summary annual estimates [20] produced using data fielded during the same time period as the BRFSS. The NHIS is itself a survey and therefore is subject to measurement error within its estimation. Despite the known internal variance within estimates derived from the NHIS, its use as a validation tool is widely accepted. A number of studies have used NHIS to validate estimates from the BRFSS in the past [21][22][23][24]. We developed a range of weighting methods that may improve upon the method that aggregates the BRFSS using state-level weights to form a national data set.

State weights
The state-level weights are the foundations on which the national weights will be computed in the second part of the methods. The weights start from design weights-also known as base weights or sampling weights-computed as the reciprocal of the probabilities of selection. States choose to stratify samples by geographic regions. The states make use of disproportional stratified sampling for fielding efficiency, and the design weights reflect these differential selection probabilities. The design weights also include a correction for the use of overlapping dual landline and cell phone frames. Finally, the weights are raked [19], iteratively fitted to population distributions used as margins shown in Table 2. The BRFSS uses both the American Community Survey (ACS) and Nielsen Claritas for control totals to weight data at the state and sub-state regional level, with the exception of phone usage, which is taken from the National Center for Health Statistics (NCHS) [1].

Variances
As would be expected, there is variability in state-level weights (design weights or sampling weights), which reflects the unequal sampling rates adopted across states. Because the base weights are computed as the reciprocal of sampling probabilities, and for a stratified random sampling design, the probabilities are, in essence, sampling rates in different strata and overall.
Because sample sizes are not proportional to state population sizes, the sampling rates are much larger in the smaller states than in the larger states, as illustrated in Table 3. The table shows that the sampling rate is .05% or less in large states, such as California, New York and Texas; by contrast, the sampling rate is higher than 1.0% for small states such as Nebraska, Montana, South Dakota, and Wyoming. Table 3 also presents the design effect (DEFF) due to weighting at the state level, the component of the DEFF due to unequal weighting effects. It gauges the impact of the weight variability on sampling error under two scenarios: a) under simple random sampling, and b) by allowing for the impact of unequal weighting effects.
The measure of sampling error shown in this table is the margin of error, i.e., the half-width of a 95% confidence interval. It is also worth noting that design effects are high for Florida as the state oversampled smaller counties that year, as it does every 3 years.
The national design effect of 4.49, which applies to national estimates produced using the concatenated statelevel weights, is substantial. This design effect more than doubles the margin of error on such estimates due to the additional variance introduced by the concatenated or aggregated weights. Reduction of variance using a national weighting method, rather than aggregating the state weights would therefore be preferable.

Bias and raking
It is reasonable to assume that the use of the aggregated state-level weights may lead to biases at the national level to the extent that for key demographics, as the aggregated weighted distribution does not match the national population distribution. For example, although each state's population is appropriately weighted, the estimated percentage for Hispanics is 15.5% with the aggregated while a national weighting method would reduce that proportion to 15%, a more accurate representation of national percentages. The demographic biases in the aggregated method, therefore, may have implications for health outcomes that may show variations across demographic groups. To control for this potential bias, the national weights could be raked at the national level using as many of the raking dimensions-among those used at the state level-as possible for convergence and stability. In addition, national raking could use states as an additional margin to preserve the state totals and to reproduce state estimates. We therefore produced a series of reweighting methods using a range of raking margins defined in Table 4, in addition to the state-level margins defined in Table 2. Some of the national raking methods add additional margins to the first eight, starting with the overall state margins and then adding cross-classifications of state with key demographic variables. Each of these reweighting methods start with the original BRFSS design weights and readjusted the raking process at the national level.
The first reweight uses the original raking margins as described in Table 2, but readjusts to reflect a single national demographic weighting rather than merely aggregating the states' unequal samples. The second reweight uses the original eight raking margins as well as state (Margin 9). The third reweight includes three classifications (age, sex and race/ethnicity) by state. An additional three reweighting methods are tested in an effort to reduce the overall variability of the weights. These three methods use the same overall raking margins as the first three methods but collapse some demographics (race and age) into larger categories. Some additional collapsing of margins is performed on individual cells to ensure that all cells obtained a minimum sample sizes of 300 or a minimum sample percentage of 5.0%. In Methods 4-6, margins 6 and 7 were collapsed. Race/ethnicity in margin 6 was collapsed to non-Hispanic White and Other for males; non-Hispanic White, non-Hispanic Black, and Other for females. In margin seven, race/ethnicity was collapsed to non-Hispanic White and Other.
In total, six national weighting strategies are tested: Method 1 uses the same margins as the original BRFSS, but weighted at the national level; Method 2 uses the BRFSS margins at the national level and adding state; Method 3 uses the BRFSS margins, and adding state with three additional state cross categories; Method 4   Table 4).

Results
The methods are compared in terms of the estimated variance and bias of resulting weighted survey estimates. The estimated variances are gauged in two ways. First, in terms of the variability in the weights, we assessed a pure contribution of unequal weighting to the design effects and survey variances. Second, using a more empirical approach, we looked at the estimated variances for a number of key health indicators. The indicators are for current smoking, diabetes, arthritis, asthma, stroke, lack of insurance, obesity, and HIV testing. Finally a single indicator, diabetes, is examined by demographic subgroup to examine whether some of the methods may perform better for subgroup estimates.
We begin comparing the biases in the different weighted estimates using the aggregated, traditional method and the six new national weighting methods. The biases are estimated by comparing the weighted estimates with a benchmark available from the National Health Interview Survey (NHIS), specifically, from Tables of Summary Health Statistics for 2013 [4].
Weighted prevalence estimates for a number of key health indicators are presented in Table 5 using the aggregated, traditional method and the six new national weighting methods together with the NHIS annual summary estimates [20] for the same or similar indicators. The NHIS estimates also permit the computation of a reduction in Mean Squared Error (MSE), estimated as the variance plus the square of the bias (the absolute difference between the weighted estimate and the benchmark NHIS estimate (MSE = SE 2 + [Percent -Percent NHIS] 2 )).
There are little to no differences in the MSE reduction among the methods for the responses to the questions on stroke and insurance, but more discernable differences in the question on whether respondents had ever had asthma. While each method reduces the MSE by .012 to .013, making it difficult to ascertain differences between them, methods 4 and 2 perform better than others when estimates are compared against the NHIS benchmark.
Since health conditions vary by demographic characteristics, subgroups of respondents were examined for differences on responses to the diabetes question (see Table 6). Diabetes was selected, since it is a condition that varies by demographic group. Table 6 shows that for Hispanic group estimates, the MSE is lowest for Method 4.
The BRFSS calculates a design weight for each respondent based on the probability of selection. This weight takes into account the number of adults and telephones within each household as well as the size of the sample drawn within each state and substate region [1]. Table 7 presents the variability in the weights as measured by the design effect (DEFF) due to unequal weighting for each method. It also shows the margin of error (half-width for the 95% confidence interval) for each method. The table suggests a slight superiority for the two methods using 8 marginal classesthat is, a reduction in the variance of the national weights, which translates into more precise national estimates. Table 7 also indicates that Method 4 has the lowest design effect of 3.92, as well as a comparatively low coefficient of variation at 1.71. We stress that this analysis is confined to the DEFF component due to unequal weighting effects, and therefore, do not reflect the variance gains induced by stratification (e.g., by states). The stratification effects, or gains, are the same across all the national weighting methods. Incorporating these gains in the variance estimation process is also an important element of the national weighting estimation strategy developed in this research. Figure 1 shows the relative reduction in variance of the weights, compared with the aggregated (baseline) approach. This measure of relative reduction is based on the average variance of the key estimates considered in this empirical investigation. Specifically, the relative where V i is the average variance under the weighting method i and V 0 is the average variance under the aggregated method. The figure shows that the largest reductions in average variance are achieved by the two methods with eight margins-i.e., Method 1 (without collapsing) and Method 4 (with collapsing), each reducing the variance in the weights by more than 14 %. When demographic characteristics are taken into account, some differences are noted among the methods in that there is more variance. Of the national weighting methods, Method 4 performs better in terms of the NHIS benchmark, producing estimates closest to the NHIS benchmarks in five of the 12 cases. In addition, Method 4 reduces the MSE by a greater proportion than the other methods.
Thus Method 4 illustrates superiority over the other methods in terms of reduction in design effect and variance, and comes closer to matching national estimates from an outside source.

Discussion
The increased uniformity of BRFSS sampling and weighting methods across states since 2011 makes the aggregation more efficient than in earlier investigations, starting in the late 1990s and early 2000s [16]. At that time, the variation in the sampling and weighting methodologies across states created additional challenges.
One additional motivation for the BRFSS data weighting methods to national population totals is the fact that there are unequal selection probabilities among the state samples. It is clear that the design effect at the national level is high and that the methods proposed decrease the variance of the weights (as shown in Fig. 1).
For the limited set of estimates compared against the NHIS national estimates, the aggregated method of weighting produced estimates that were not statistically different than those of other weighting methods tested (based on chi-square tests or t-tests of significance). Data  In Methods 4-6, margins 6 and 7 were collapsed to achieve minimum sample sizes of 300 or minimum sample percentages of 5.0%. Race/ethnicity in margin 6 was collapsed to non-Hispanic White and Other for males; non-Hispanic White, non-Hispanic Black, and Other for females. In margin 7, race/ethnicity was collapsed to non-Hispanic White and Other b Margins 10 and 12 were collapsed within region to achieve minimum sample sizes of 250 or minimum sample percentages of 5.0%. The age categories of 18-24 and 25-34 were collapsed together in margin 10 for 16 states. In margin 12, all race/ethnicity categories were collapsed together for two states (Maine and Vermont) Although both BRFSS and NHIS collect information on these outcomes, there are minor differences in question wording between the two surveys, as well as differences in the mode of administration users who conduct other analyses using additional variables and methods, however, have no prior knowledge of the degree to which the use of national weights will reduce bias in their outcomes. What is known is that the national weighting methods will lead to reductions in variance due to unequal weighting effects; in addition, the new methods will also account for the demographic biases built into the multiple sampling designs adopted by the states. The incentive for the adoption of national weighting comes from the reduction in the variance in the weights and improvement in demographic representation at the national level. Such improvements are the core of the argument in favor of national weights. While the reduction of MSE overall is small among weight methods tested, Method 4 is superior to the other weighting methods in terms of lower variance in weights (see Fig. 1). It also has a lower overall design effect than other methods (see Table 6) and uses collapsed margins, making it somewhat more efficient to produce. When we compared prevalence estimates against those of the NHIS benchmark, we found that it performed better than other national weighting strategies. Method 4 is similar to the weight method used for individual states in that the margins are the same, but adjustments to the control totals are made to account for the national population, rather than aggregating from the state weighted totals. It is also worth noting that our updated recommendations, using 2013 as well as 2012 BRFSS data and focused more on variances, are not exactly the same as the more mixed picture depicted in national conferences (e.g., [25]). 1 The previous work was more focused on bias reduction where the methods seem equivalently effective at the national level. That work was also focused on a smaller subset of health indicators and older BRFSS data (2012 versus 2013).

Conclusions
The methodology described in this paper provides national weights for the state-based BRFSS. Data users who aggregate data from all states would benefit from the use of these new national weights. Persons using data from only a few states would find that the weights associated with state level populations would be better suited to their analyses; an analysis that used data from a BRFSS module administered to residents in only a few states should use state-level weights rather than a national weight. Users should always take care to include complex sample designs in any and all analyses, which included BRFSS data, as they are both collected using   [2]. Unlike the usual aggregated approach, the new methods lead to weighted distributions that reproduce national population distributions for all key demographic groupings. To the extent that survey outcomes are associated with these demographic characteristics, matching the national distributions will reduce bias in estimates of these outcomes at the national level.