Estimating cardiovascular disease incidence from prevalence: a spreadsheet based model

Background Disease incidence and prevalence are both core indicators of population health. Incidence is generally not as readily accessible as prevalence. Cohort studies and electronic health record systems are two major way to estimate disease incidence. The former is time-consuming and expensive; the latter is not available in most developing countries. Alternatively, mathematical models could be used to estimate disease incidence from prevalence. Methods We proposed and validated a method to estimate the age-standardized incidence of cardiovascular disease (CVD), with prevalence data from successive surveys and mortality data from empirical studies. Hallett’s method designed for estimating HIV infections in Africa was modified to estimate the incidence of myocardial infarction (MI) in the U.S. population and incidence of heart disease in the Canadian population. Results Model-derived estimates were in close agreement with observed incidence from cohort studies and population surveillance systems. This method correctly captured the trend in incidence given sufficient waves of cross-sectional surveys. The estimated MI declining rate in the U.S. population was in accordance with the literature. This method was superior to closed cohort, in terms of the estimating trend of population cardiovascular disease incidence. Conclusion It is possible to estimate CVD incidence accurately at the population level from cross-sectional prevalence data. This method has the potential to be used for age- and sex- specific incidence estimates, or to be expanded to other chronic conditions.


Background
Disease incidence and prevalence are core indicators of population health. Accurate estimates of chronic diseases prevalence and incidence are essential for assessing population burden of disease and determining health priorities [1]. Incidence (rate of new cases among those at risk) is a better indicator of the progress of a disease epidemic, in comparison to prevalence (fraction of the population with the condition) for several reasons. First, change in prevalence lags behind the actual changes in population risk and incidence. Second, prevalence reflects historical trend and accumulation of cases, rather than recent incidence change. Many factors other than recent incidence change, such as population aging and survival improvement may influence the change in prevalence [2]. This is especially relevant to conditions such as myocardial infarction (MI) or stroke, a considerable portion of which are silent or transitory [3,4]. Some affected individuals may not seek medical care, although they are at similar risk for adverse outcomes such as mortality when compared to individuals with detectable signs and symptoms. Many of the first occurrences of MI and stroke are fatal [5], such that recorded incidence of such conditions typically based on hospitalized cases is likely to underestimate the actual disease burden.
The most direct approach to estimating cardiovascular disease incidence is through longitudinal observational studies. Such studies are time-consuming, expensive, and are disease specific. Electronic health record systems and disease registry systems provide new sources of incidence estimates, but these are not available in most of the developing countries. Alternatively, incidence could be estimated by mathematical models, with prevalence data at multiple time points, population demographic change, and mortality as fundamental input. Prevalence of cardiovascular disease, typically from cross-sectional surveys, are generally more easily obtainable [6]. Mathematical models with varying degrees of sophistication have been developed, including Hallett's method to estimate HIV incidence in Africa [7][8][9]. Theoretically, Hallett's framework should be applicable in different settings and for other diseases as long as the disease is not reversible and local population/mortality data are available [6,9]. However, the assumptions in this model were based on the HIV epidemic in Africa, and may not be applicable for cardiovascular diseases in a developed country.
Developing and testing a prevalence-incidence model (PI model) for cardiovascular diseases has both global and local significance. In Canada, for example, it is no longer possible to estimate the incidence of self-reported cardiovascular disease after the closing of the National Population Health Survey (NPHS) in 2012 [10]. While electronic health record systems are being set up in many locations across Canada, there is as yet no linked, national database. Reporting of all heart attack episodes (including silent cases and fatal cases) were rare, although hospitalized heart attack incidence rate was more frequently reported [11]. The PI model would be especially relevant for remote areas, in which cohort studies or electronic health record systems are likely to be unavailable. A PI model would also help to identify the gap between reported and true cardiovascular disease incidence, indicating priorities for health services planning.
In this study, we propose to modify the Hallett's method and test its applicability to estimating cardiovascular disease incidence in the North America setting. We shall determine the hazard/probability of developing new cases across the time interval between two crosssectional surveys and generate age-standardized incidence rates. We shall compare model estimated incidence rates with those obtained from cohort studies and population monitoring programs. If sufficient waves of cross-sectional surveys are available, the model could potentially also detect trends.

Methods
The HIV prevalence-incidence model Among PI models with different complexities, the one proposed by Hallett et al. has been widely cited and used [9]. Detail of model derivation and validation has been published elsewhere [9]. Briefly, the difference between observed prevalence in the second survey and the expected prevalence (estimated from prevalence in the first survey and survival fraction based on mortality) provides the incident cases; the proportion of disease-negative people, the mortality rate for these people and the time interval between the two cross-sectional surveys together generate the number of person-years. Age-and gender-specific prevalence and mortality should be used to obtain more accurate estimates. Mortality of people with/without the health outcome of interest can be based on population vital statistics or the literature. The model assumes prevalence and mortality are constant during the interval of two successive surveys to keep the model simple and easy to use.

Model adjustment and validation for cardiovascular diseases
Hallett's method should work for other diseases and settings as long as the disease is not reversible. We made two major modifications to Hallett's method to better fit cardiovascular diseases. First, the survival fraction of disease positive patients is estimated as a function of 30-day case fatality, 1-year, 5-year and 10-year survival/mortality rate whenever possible, instead of assuming a constant mortality rate between the time intervals of two surveys. Second, we calculated the age-standardized incidence rate to make the estimate more comparable to other health statistics in its target population.

Model validation
We tested the performance of our modified Hallett's method in two steps. First, we applied our model to estimate myocardial infarction (MI) incidence in the US population and compared estimated values to reported incidence rates from the national environmental public health tracking network (Tracking Network) of U.S. Centers for Disease Control and Prevention (CDC), existing epidemiological studies and population statistics. We chose MI in the US population as this condition was one of the most well studied cardiovascular outcomes. The mortality, prevalence, and survival data were relatively accurate and robust. Second, we expanded the outcome to the broader category of any heart disease (HD), which also includes heart failure and other forms of heart diseases, and compared incidence estimates with observed incidences from a national representative cohort study, in Canada. The cross-sectional surveys and the longitudinal survey we used share the same sampling framework and chronic disease module in their questionnaires. In the text hereafter, heart disease refers to combined MI, angina and heart failure in the Canadian population without further specification.

Data source
For the United States, we chose the National Health and Nutrition Examination Survey (NHANES), which provides MI prevalence estimates in the US population at two-year intervals from 1999 to 2012 [12]. NHANES is a repeated cross-sectional survey of a nationally representative sample of the US population, with a multistage, stratified sampling design. Our study population included all participants aged 35 years and older in the 7 consecutive waves of the survey. Participants who answered "Yes" to questionnaire item MCQ160e ("Has a doctor or other health professional ever told you that you had heart attack (also called myocardial infarction)") will be classified as prevalent MI cases in this study. Sex-specific mortality data for MI patients were calculated from hospitalized mortality and 2-year mortality after hospital discharge [13,14]. Mortality data for the MI-negative population were calculated as the difference between all-cause mortality rate and the mortality rate of MI, from "Underlying Cause of Death 1999-2011" on the CDC WONDER Online Database [15]. Hospitalized MI incidence data were based on 26 States which participated in the CDC environmental public health tracking program [16].
For Canada, we selected the Canadian Community Health Survey (CCHS), which provides prevalence of self-reported heart diseases in the Canadian population, also at two-year intervals from 2001 to 2011 [17]. The CCHS is a series of cross-sectional surveys conducted biannually before 2007 and yearly since then. It uses a stratified, multistage probability sampling design. It collects information on health status, health care utilization and health determinants for the Canadian population. In this study, we identified participants aged 12 years and older in 6 consecutive waves of the survey. Participants who answered "Yes" to questionnaire item CCQ121 ("Have you had heart diseases which lasted 6 months or more and have been diagnosed by a health professional?") were classified as prevalent heart disease cases. The mortality rate of heart disease was calculated as the total of mortality from MI, angina and heart failure. Mortality rates for MI patients were assumed to be the same as the US population because studies suggest that 2-year mortality rates after MI are comparable between the two populations [13,14,18]. Mortality rates for angina and heart failure patients were based on Ontario data [19]. Mortality rates for heart disease free patients were calculated as the difference between all-cause mortality and mortality from heart disease, basing on data from Statistics Canada's Canadian Mortality Database, as available online from the CANSIM Table "Cause of Death 2000-2011" [20]. Heart disease incidence rates were calculated from the NPHS, which is a longitudinal health survey [21]. It started in 1994/1995 with an initial sample size of 17276 and ended in 2012 after another 8 follow-ups, which was conducted every two years.

Statistical analysis
The incidence rate of the outcome of interest in each age-and sex-group is estimated by equation (1), where I i is the incidence, F i approximates the proportional cohort size change over the time interval between two surveys (which was given by equation (2)), SP i is the proportion of disease-positive patients at the beginning of the first survey who survive to the next survey, SN i is the corresponding proportion for the disease-negative people, p i,0 is the prevalence of outcome of interest at the first survey, p i,T is the prevalence at the second survey respectively. In this paper, the time intervals for both the NHANES and the CCHS were 2 years. Equation (1) calculates the point estimates of incidence. Poissonbased confidence intervals for incidence can be estimated, treating the survey as a simple random design, without considering the survey design and sampling procedure. After obtaining the crude incidence rate for each age-and sex-group, a standardized incidence for the population is calculated using the direct method. For the United States cohort, we used the total US population in 2000 as the standard population; for the Canadian cohort, we used the total Canadian population from the 2001 Census. As each wave of the NHANES and the CCHS covers two years (e.g. [1999][2000][2001][2002], the incidence estimated from such two waves was presented as between January 1st of the second year of each wave, i.e.,between Jan 1, 2000, and Jan 1, 2002. Sex-specific five-year age group was used as the smallest analytic unit. Prevalence estimates for each age-and sex-group were calculated with proper sample weights and survey design effect used by the NHANES and the CCHS. It was substituted with the value from next age group if the prevalence was zero in the very young age groups. Baseline 2-year mortality after MI was calculated with equation (3), for both the US population and the Canadian population, where M MIin denotes MI mortality in hospital, and M MIout denotes MI mortality after hospital discharge. A 3% decrease in mortality rate for MI positive patients each year was assigned according to the literature [22]. The 2-year survival fraction for MI patients (SP MI ) was given by equation (4). Baseline 2year survival fraction after heart disease (SP HD ) for the Canadian population was calculated with equation (5), where w MI , w AG , w HF denotes the weighted proportion of MI/angina/heart failure to the sum of the three (in the CCHS cycle 2001). The mortality rate for MI or heart disease negative people were calculated with equation (6) and (7), as the difference between all-cause mortality (MV All − cause ) and MI/heart disease mortality (MV MI , MV HD ) from population vital statistics. The observed incidences of hospitalized MI in the US population were retrieved from CDC report, epidemiological studies and population statistics [2,16,23]. The observed incidences of heart disease in the Canadian population were calculated with corresponding weights from NPHS. In this paper, we will use the term "estimated" to refer to model calculated incidence, and the term "observed" to refer to incidence estimated from a cohort study, or reported in the literature or population statistics.
To validate the model performance, we compared the estimated incidence with observed values. We checked whether our estimated incidence fall into the high-low range (or 95% CIs) of observed incidence. When the definition of best available reported outcome is different from what we modeled, we adjusted our estimated values to make a fair comparison. For example, the most reliable statistics in the US population was hospitalized MI incidence. Thus we adjusted downward our estimated values, considering about 30% of MI would be silent cases, and another 20% would be fatal cases, to be more comparable [4,24,25]. The time trend of estimated incidence was investigated by fitting a linear regression, with estimated incidence as the y outcome and survey year minus 2000 as the single x predictor.

US Data
Estimated MI incidence in the US population (aged 35 years and older) from 2000 are provided in Table 1. The estimated MI incidence was 843 (1/100,000) between 2000 and 2001, and 678 (1/100,000) from 2010 to 2011, with a decreasing trend. The highest estimated MI incidence from the model was observed in 2002-2003. We also estimated hospitalized MI incidence to be 56% of MI incidence. Linear regression showed that the MI incidence decreased by approximately 32 (1/100,000) each year since 2000. The corresponding decrease in hospitalized MI was approximately 18 (1/100,000) respectively, which translated to a 3.8% decrease annually. We projected a 38% reduction in hospitalized MI incidence from 2000 to 2010, based on the prevalence from NHANES. Our hospitalized MI incidence estimates were compared against the reported values from those States which participated in the CDC environmental public health tracking program from 2000 to 2011 (Fig. 1). All of our estimated incidences fell into the high-low range of the reported values.

Canadian data
The model estimated heart disease incidence from CCHS, and observed heart disease incidence (with 95% CIs) from NPHS, for the Canadian population aged 12 years and older are shown in Table 2. We modelled two sets of heart disease incidence, with one set standardized to the NPHS age structure, and another set standardized to 2001 Canadian Census population. Our estimated heart disease incidence (first set) were very close to the NPHS values, with the smallest relative difference observed in 2009-2011 (0.7%), and the largest observed in 2003-2005 (12.9%). All of our first set estimated heart disease incidence fell into the 95% CIs of NPHS heart disease incidence. Figure 2 visually depicts how close between the NPHS incidence and our estimated incidence. The second set of heart disease incidence showed a decreasing trend among the Canadian population from 2001 to 2011. On average, heart disease incidence decreased by 13 (1/100,000) each year from 2001. The difference between the two sets of estimated heart disease incidence was due to the different age structures of the NPHS and the 2000 Canadian Census we used during direct standardization. We provide the model input, which is also part of the results, in the Appendix. The mortality rate of acute MI during hospitalization, the 2-year mortality rate of acute MI after hospital discharge, and the survival fraction 2 years after acute MI for the U.S. population in 1999 are presented in Table 3 in Appendix. The prevalence of MI in the U.S. population from 1999 to 2009 (in a 2-year time interval) estimated from NHANES are shown in Table 4 in Appendix. Table 5 in Appendix shows the 2-year mortality rate after MI, angina, heart failure, and the constructed 2-year survival fraction after heart disease. Table 6 in Appendix presents the prevalence of heart disease in the Canadian population, calculated from CCHS 2001 to 2011. Hospitalized MI incidence data from 26 States which participated in the CDC environmental public health tracking program (from 2000 to 2012) are provided in Table 7 in Appendix. Table 8 and 9 in Appendix provide the population age structure of the U.S. population and the Canadian population used for standardization, respectively.

Discussion
A useful prevalence-to-incidence method was tested and validated for cardiovascular diseases in the general population of USA and Canada in this paper. Accurate estimates of cardiovascular disease incidence are still not available for populations not served by electronic health care information systems, or representative longitudinal surveys. Such data are lacking for sub-populations undergoing rapid health transition, with a rapidly increasing burden of cardiovascular diseases, such as indigenous people in North America. The prevalence-to-incidence method offers an alternative option to monitor, and compare the emerging cardiovascular disease pandemic, both globally and locally. Although we tested the method using cardiovascular health outcomes, it should also apply in other chronic conditions as long as the condition is irreversible Estimates of hospitalized MI incidence in the U.S. population were in very close agreement with the actual statistics [16,23]. Our MI incidence estimates can theoretically capture all MI cases. Hospitalized MI incidence miss those individuals who do not receive medical care, or died before reaching care. To better compare and validate our method, we made assumptions to convert all MI incidence to hospitalized MI incidence based on literature [4,24,25]. All of our estimated incidences fell into the high-low range of reported hospitalized MI incidence in the corresponding year. Estimates of hospitalized MI incidence were close to the low end of reported values in certain years. Several reasons might explain that. First, the NHANES was designed to represent the U.S. population demographically, its prevalence estimates for a certain disease varied across waves to some extent. For example, the prevalence of MI in female in the 2009 wave was substantially lower, while the prevalence in male in the 2003 wave was substantially higher than neighboring waves. The fluctuation in prevalence would lead to variation in incidence estimates. Second, the definitions of MI in the NHANES and Tracking Network of CDC might not match exactly. What NHANES recorded was the self-reported doctor confirmed MI. The selfreported data generally suffered from recall bias,  compared to the hospital discharge data used by the CDC Tracking Network. Third, this study used data covering more than 10 years. Changes in diagnostic techniques and criteria, in the coding of MI, or in medical care access may all contribute to the fluctuation of MI incidence, both estimated from the model and reported by the CDC Tracking Network [26]. In addition, we used the same ratio to adjust overall MI incidence to hospitalized MI incidence for all the years, and that might also introduce some uncertainties. A declining trend was observed from our hospitalized MI incidence estimates, as well as the reported values from the Tracking Network of CDC. Another study showed a very similar trend of acute MI incidence rate from 1999 and 2008 in Northern California, the incidence of which also peaked around the year 2001 and then decreased gradually [11]. Our method yielded an average annually 3.8% decrease of hospitalized MI incidence in the general U.S. population, comparing to a 2.4% decrease in acute MI in North California from 1999 to 2008 [11], a 5.8% decrease in acute MI among the Medicare fee-forservice beneficiaries from 2002 to 2007 [27], and a 4.9% decrease in age-and biomarker-adjusted incidence of hospitalization for AMI or fatal CHD in the Atherosclerosis Risk in Communities Study (ARIC) from 1987 to 2008 [28]. As summarized above, the decreasing rate identified from our estimates was also in accordance with reported values from cohort studies or surveillance data.
We further tested our method in the Canadian population. The NPHS and the CCHS share similar survey framework and the same questionnaire for chronic conditions [17,21]. Our heart disease incidence estimates from different cycles of the CCHS were very close to the NPHS heart disease incidence when we standardized to the NPHS age structure. That result demonstrated that our method could provide accurate incidence estimate as a cohort study. More interestingly, the heart disease incidence estimates decreased constantly since 2001 if we standardized the results to the Canadian census population. This fact showed that our method was actually superior to closed cohort study (without buy-in participants during followup), in terms of estimating the incidence and its trend over time. It would be more consistent if we validated our method using MI as the outcome in the Canadian population first, however, unfortunately, the CCHS stopped to ask about MI since 2004. We derived the mortality of heart disease based on information for MI, angina and heart failure. The good agreement between incidence estimates and cohort study values demonstrated that other than a single health outcome (e.g. MI), this method could also be used for health outcomes with multiple components, e.g. heart disease (MI, angina, and heart failure), potentially stroke (ischemic, hemorrhagic, and unspecified) and other health outcomes.
Our modelling method also has limitations. It depends on the quality of prevalence and mortality data, and the extent of such data available. Assumptions and robust sensitivity analyses become essential in some circumstances. Our method was tested in short survey interval and small age-and sex-group basis. The combined incidences for male and female, and for all ages were compared against cohort study or population statistics. Future work is planned to test the model performance for subgroups, e.g. male and female separately, or in certain specific age group. The application of this method in other chronic conditions also needs to be tested, especially when their prevalence and mortality differ from MI or heart disease substantially.

Conclusion
In conclusion, a reliable prevalence to incidence method for cardiovascular health outcomes was tested and validated. The incidence estimates given by the method were on average within 10% of the values from a cohort study. This method could also capture the trend of incidence if multiple cross-sectional data are available. This method has the potential to be used in population without valid cardiovascular disease incidence statistics. The dark grey bars with confidence intervals represent the estimated incidence of heart disease from the National Population Health Survey. The light grey bars represent the estimated incidence of heart disease from our model. All the predicted values were within the 95% CIs of the observed incidence. The average difference between the predicted values and the observed incidence were less than 10% This files includes 7 tables with the following data: 1) Cohort mortality for US population with and without acute myocardial infarction in 1999, 2) Prevalence of myocardial infarction in NHANES participants (%), 3) Cohort mortality for Canadian population with and without heart disease in 2001, 4) Prevalence of heart disease in CCHS participants (%), 5) Hospitalized MI incidence in different states in the U.S. from 2000 to 2012, 6) Age distribution by sex, 2000 US population (%), 7) Age distribution by sex, 2001 Canadian population (%).