Estimating cardiovascular disease incidence from prevalence: a spreadsheet based model
© The Author(s). 2017
Received: 25 July 2016
Accepted: 26 December 2016
Published: 23 January 2017
Disease incidence and prevalence are both core indicators of population health. Incidence is generally not as readily accessible as prevalence. Cohort studies and electronic health record systems are two major way to estimate disease incidence. The former is time-consuming and expensive; the latter is not available in most developing countries. Alternatively, mathematical models could be used to estimate disease incidence from prevalence.
We proposed and validated a method to estimate the age-standardized incidence of cardiovascular disease (CVD), with prevalence data from successive surveys and mortality data from empirical studies. Hallett’s method designed for estimating HIV infections in Africa was modified to estimate the incidence of myocardial infarction (MI) in the U.S. population and incidence of heart disease in the Canadian population.
Model-derived estimates were in close agreement with observed incidence from cohort studies and population surveillance systems. This method correctly captured the trend in incidence given sufficient waves of cross-sectional surveys. The estimated MI declining rate in the U.S. population was in accordance with the literature. This method was superior to closed cohort, in terms of the estimating trend of population cardiovascular disease incidence.
It is possible to estimate CVD incidence accurately at the population level from cross-sectional prevalence data. This method has the potential to be used for age- and sex- specific incidence estimates, or to be expanded to other chronic conditions.
KeywordsIncidence Prevalence Model Cardiovascular disease NHANES, Canadian Community Health Survey
Disease incidence and prevalence are core indicators of population health. Accurate estimates of chronic diseases prevalence and incidence are essential for assessing population burden of disease and determining health priorities . Incidence (rate of new cases among those at risk) is a better indicator of the progress of a disease epidemic, in comparison to prevalence (fraction of the population with the condition) for several reasons. First, change in prevalence lags behind the actual changes in population risk and incidence. Second, prevalence reflects historical trend and accumulation of cases, rather than recent incidence change. Many factors other than recent incidence change, such as population aging and survival improvement may influence the change in prevalence . This is especially relevant to conditions such as myocardial infarction (MI) or stroke, a considerable portion of which are silent or transitory [3, 4]. Some affected individuals may not seek medical care, although they are at similar risk for adverse outcomes such as mortality when compared to individuals with detectable signs and symptoms. Many of the first occurrences of MI and stroke are fatal , such that recorded incidence of such conditions typically based on hospitalized cases is likely to underestimate the actual disease burden.
The most direct approach to estimating cardiovascular disease incidence is through longitudinal observational studies. Such studies are time-consuming, expensive, and are disease specific. Electronic health record systems and disease registry systems provide new sources of incidence estimates, but these are not available in most of the developing countries. Alternatively, incidence could be estimated by mathematical models, with prevalence data at multiple time points, population demographic change, and mortality as fundamental input. Prevalence of cardiovascular disease, typically from cross-sectional surveys, are generally more easily obtainable . Mathematical models with varying degrees of sophistication have been developed, including Hallett’s method to estimate HIV incidence in Africa [7–9]. Theoretically, Hallett’s framework should be applicable in different settings and for other diseases as long as the disease is not reversible and local population/mortality data are available [6, 9]. However, the assumptions in this model were based on the HIV epidemic in Africa, and may not be applicable for cardiovascular diseases in a developed country.
Developing and testing a prevalence-incidence model (PI model) for cardiovascular diseases has both global and local significance. In Canada, for example, it is no longer possible to estimate the incidence of self-reported cardiovascular disease after the closing of the National Population Health Survey (NPHS) in 2012 . While electronic health record systems are being set up in many locations across Canada, there is as yet no linked, national database. Reporting of all heart attack episodes (including silent cases and fatal cases) were rare, although hospitalized heart attack incidence rate was more frequently reported . The PI model would be especially relevant for remote areas, in which cohort studies or electronic health record systems are likely to be unavailable. A PI model would also help to identify the gap between reported and true cardiovascular disease incidence, indicating priorities for health services planning.
In this study, we propose to modify the Hallett’s method and test its applicability to estimating cardiovascular disease incidence in the North America setting. We shall determine the hazard/probability of developing new cases across the time interval between two cross-sectional surveys and generate age-standardized incidence rates. We shall compare model estimated incidence rates with those obtained from cohort studies and population monitoring programs. If sufficient waves of cross-sectional surveys are available, the model could potentially also detect trends.
The HIV prevalence-incidence model
Among PI models with different complexities, the one proposed by Hallett et al. has been widely cited and used . Detail of model derivation and validation has been published elsewhere . Briefly, the difference between observed prevalence in the second survey and the expected prevalence (estimated from prevalence in the first survey and survival fraction based on mortality) provides the incident cases; the proportion of disease-negative people, the mortality rate for these people and the time interval between the two cross-sectional surveys together generate the number of person-years. Age- and gender- specific prevalence and mortality should be used to obtain more accurate estimates. Mortality of people with/without the health outcome of interest can be based on population vital statistics or the literature. The model assumes prevalence and mortality are constant during the interval of two successive surveys to keep the model simple and easy to use.
Model adjustment and validation for cardiovascular diseases
Hallett’s method should work for other diseases and settings as long as the disease is not reversible. We made two major modifications to Hallett’s method to better fit cardiovascular diseases. First, the survival fraction of disease positive patients is estimated as a function of 30-day case fatality, 1-year, 5-year and 10-year survival/mortality rate whenever possible, instead of assuming a constant mortality rate between the time intervals of two surveys. Second, we calculated the age-standardized incidence rate to make the estimate more comparable to other health statistics in its target population.
We tested the performance of our modified Hallett’s method in two steps. First, we applied our model to estimate myocardial infarction (MI) incidence in the US population and compared estimated values to reported incidence rates from the national environmental public health tracking network (Tracking Network) of U.S. Centers for Disease Control and Prevention (CDC), existing epidemiological studies and population statistics. We chose MI in the US population as this condition was one of the most well studied cardiovascular outcomes. The mortality, prevalence, and survival data were relatively accurate and robust. Second, we expanded the outcome to the broader category of any heart disease (HD), which also includes heart failure and other forms of heart diseases, and compared incidence estimates with observed incidences from a national representative cohort study, in Canada. The cross-sectional surveys and the longitudinal survey we used share the same sampling framework and chronic disease module in their questionnaires. In the text hereafter, heart disease refers to combined MI, angina and heart failure in the Canadian population without further specification.
For the United States, we chose the National Health and Nutrition Examination Survey (NHANES), which provides MI prevalence estimates in the US population at two-year intervals from 1999 to 2012 . NHANES is a repeated cross-sectional survey of a nationally representative sample of the US population, with a multistage, stratified sampling design. Our study population included all participants aged 35 years and older in the 7 consecutive waves of the survey. Participants who answered “Yes” to questionnaire item MCQ160e (“Has a doctor or other health professional ever told you that you had heart attack (also called myocardial infarction)”) will be classified as prevalent MI cases in this study. Sex-specific mortality data for MI patients were calculated from hospitalized mortality and 2-year mortality after hospital discharge [13, 14]. Mortality data for the MI-negative population were calculated as the difference between all-cause mortality rate and the mortality rate of MI, from “Underlying Cause of Death 1999-2011” on the CDC WONDER Online Database . Hospitalized MI incidence data were based on 26 States which participated in the CDC environmental public health tracking program .
For Canada, we selected the Canadian Community Health Survey (CCHS), which provides prevalence of self-reported heart diseases in the Canadian population, also at two-year intervals from 2001 to 2011 . The CCHS is a series of cross-sectional surveys conducted biannually before 2007 and yearly since then. It uses a stratified, multistage probability sampling design. It collects information on health status, health care utilization and health determinants for the Canadian population. In this study, we identified participants aged 12 years and older in 6 consecutive waves of the survey. Participants who answered “Yes” to questionnaire item CCQ121 (“Have you had heart diseases which lasted 6 months or more and have been diagnosed by a health professional?”) were classified as prevalent heart disease cases. The mortality rate of heart disease was calculated as the total of mortality from MI, angina and heart failure. Mortality rates for MI patients were assumed to be the same as the US population because studies suggest that 2-year mortality rates after MI are comparable between the two populations [13, 14, 18]. Mortality rates for angina and heart failure patients were based on Ontario data . Mortality rates for heart disease free patients were calculated as the difference between all-cause mortality and mortality from heart disease, basing on data from Statistics Canada’s Canadian Mortality Database, as available online from the CANSIM Table “Cause of Death 2000-2011” . Heart disease incidence rates were calculated from the NPHS, which is a longitudinal health survey . It started in 1994/1995 with an initial sample size of 17276 and ended in 2012 after another 8 follow-ups, which was conducted every two years.
The incidence rate of the outcome of interest in each age- and sex- group is estimated by equation (1), where I i is the incidence, F i approximates the proportional cohort size change over the time interval between two surveys (which was given by equation (2)), SP i is the proportion of disease-positive patients at the beginning of the first survey who survive to the next survey, SN i is the corresponding proportion for the disease-negative people, p i,0 is the prevalence of outcome of interest at the first survey, p i,T is the prevalence at the second survey respectively. In this paper, the time intervals for both the NHANES and the CCHS were 2 years. Equation (1) calculates the point estimates of incidence. Poisson-based confidence intervals for incidence can be estimated, treating the survey as a simple random design, without considering the survey design and sampling procedure. After obtaining the crude incidence rate for each age- and sex- group, a standardized incidence for the population is calculated using the direct method. For the United States cohort, we used the total US population in 2000 as the standard population; for the Canadian cohort, we used the total Canadian population from the 2001 Census. As each wave of the NHANES and the CCHS covers two years (e.g. 1999–2000, 2001–2002), the incidence estimated from such two waves was presented as between January 1st of the second year of each wave, i.e.,between Jan 1, 2000, and Jan 1, 2002.
To validate the model performance, we compared the estimated incidence with observed values. We checked whether our estimated incidence fall into the high-low range (or 95% CIs) of observed incidence. When the definition of best available reported outcome is different from what we modeled, we adjusted our estimated values to make a fair comparison. For example, the most reliable statistics in the US population was hospitalized MI incidence. Thus we adjusted downward our estimated values, considering about 30% of MI would be silent cases, and another 20% would be fatal cases, to be more comparable [4, 24, 25]. The time trend of estimated incidence was investigated by fitting a linear regression, with estimated incidence as the y outcome and survey year minus 2000 as the single x predictor.
Estimated incidence rate of all and hospitalized myocardial infarction in the US population from 2000 to 2011 (1/100000)
Estimated incidence of all MI
Estimated incidence of hospitalized MIa
Incidence rate of heart diseases in NPHS cohort and model estimates for the Canadian population from 2001 to 2011 (1/100000)
lower 95% CI of reported incidence
upper95% CI of reported incidence
Estimated incidence 1a
Estimated incidence 2b
We provide the model input, which is also part of the results, in the Appendix. The mortality rate of acute MI during hospitalization, the 2-year mortality rate of acute MI after hospital discharge, and the survival fraction 2 years after acute MI for the U.S. population in 1999 are presented in Table 3 in Appendix. The prevalence of MI in the U.S. population from 1999 to 2009 (in a 2-year time interval) estimated from NHANES are shown in Table 4 in Appendix. Table 5 in Appendix shows the 2-year mortality rate after MI, angina, heart failure, and the constructed 2-year survival fraction after heart disease. Table 6 in Appendix presents the prevalence of heart disease in the Canadian population, calculated from CCHS 2001 to 2011. Hospitalized MI incidence data from 26 States which participated in the CDC environmental public health tracking program (from 2000 to 2012) are provided in Table 7 in Appendix. Table 8 and 9 in Appendix provide the population age structure of the U.S. population and the Canadian population used for standardization, respectively.
A useful prevalence-to-incidence method was tested and validated for cardiovascular diseases in the general population of USA and Canada in this paper. Accurate estimates of cardiovascular disease incidence are still not available for populations not served by electronic health care information systems, or representative longitudinal surveys. Such data are lacking for sub- populations undergoing rapid health transition, with a rapidly increasing burden of cardiovascular diseases, such as indigenous people in North America. The prevalence-to-incidence method offers an alternative option to monitor, and compare the emerging cardiovascular disease pandemic, both globally and locally. Although we tested the method using cardiovascular health outcomes, it should also apply in other chronic conditions as long as the condition is irreversible
Estimates of hospitalized MI incidence in the U.S. population were in very close agreement with the actual statistics [16, 23]. Our MI incidence estimates can theoretically capture all MI cases. Hospitalized MI incidence miss those individuals who do not receive medical care, or died before reaching care. To better compare and validate our method, we made assumptions to convert all MI incidence to hospitalized MI incidence based on literature [4, 24, 25]. All of our estimated incidences fell into the high-low range of reported hospitalized MI incidence in the corresponding year. Estimates of hospitalized MI incidence were close to the low end of reported values in certain years. Several reasons might explain that. First, the NHANES was designed to represent the U.S. population demographically, its prevalence estimates for a certain disease varied across waves to some extent. For example, the prevalence of MI in female in the 2009 wave was substantially lower, while the prevalence in male in the 2003 wave was substantially higher than neighboring waves. The fluctuation in prevalence would lead to variation in incidence estimates. Second, the definitions of MI in the NHANES and Tracking Network of CDC might not match exactly. What NHANES recorded was the self-reported doctor confirmed MI. The self-reported data generally suffered from recall bias, compared to the hospital discharge data used by the CDC Tracking Network. Third, this study used data covering more than 10 years. Changes in diagnostic techniques and criteria, in the coding of MI, or in medical care access may all contribute to the fluctuation of MI incidence, both estimated from the model and reported by the CDC Tracking Network . In addition, we used the same ratio to adjust overall MI incidence to hospitalized MI incidence for all the years, and that might also introduce some uncertainties.
A declining trend was observed from our hospitalized MI incidence estimates, as well as the reported values from the Tracking Network of CDC. Another study showed a very similar trend of acute MI incidence rate from 1999 and 2008 in Northern California, the incidence of which also peaked around the year 2001 and then decreased gradually . Our method yielded an average annually 3.8% decrease of hospitalized MI incidence in the general U.S. population, comparing to a 2.4% decrease in acute MI in North California from 1999 to 2008 , a 5.8% decrease in acute MI among the Medicare fee-for-service beneficiaries from 2002 to 2007 , and a 4.9% decrease in age- and biomarker-adjusted incidence of hospitalization for AMI or fatal CHD in the Atherosclerosis Risk in Communities Study (ARIC) from 1987 to 2008 . As summarized above, the decreasing rate identified from our estimates was also in accordance with reported values from cohort studies or surveillance data.
We further tested our method in the Canadian population. The NPHS and the CCHS share similar survey framework and the same questionnaire for chronic conditions [17, 21]. Our heart disease incidence estimates from different cycles of the CCHS were very close to the NPHS heart disease incidence when we standardized to the NPHS age structure. That result demonstrated that our method could provide accurate incidence estimate as a cohort study. More interestingly, the heart disease incidence estimates decreased constantly since 2001 if we standardized the results to the Canadian census population. This fact showed that our method was actually superior to closed cohort study (without buy-in participants during follow-up), in terms of estimating the incidence and its trend over time. It would be more consistent if we validated our method using MI as the outcome in the Canadian population first, however, unfortunately, the CCHS stopped to ask about MI since 2004. We derived the mortality of heart disease based on information for MI, angina and heart failure. The good agreement between incidence estimates and cohort study values demonstrated that other than a single health outcome (e.g. MI), this method could also be used for health outcomes with multiple components, e.g. heart disease (MI, angina, and heart failure), potentially stroke (ischemic, hemorrhagic, and unspecified) and other health outcomes.
Our modelling method also has limitations. It depends on the quality of prevalence and mortality data, and the extent of such data available. Assumptions and robust sensitivity analyses become essential in some circumstances. Our method was tested in short survey interval and small age- and sex- group basis. The combined incidences for male and female, and for all ages were compared against cohort study or population statistics. Future work is planned to test the model performance for subgroups, e.g. male and female separately, or in certain specific age group. The application of this method in other chronic conditions also needs to be tested, especially when their prevalence and mortality differ from MI or heart disease substantially.
In conclusion, a reliable prevalence to incidence method for cardiovascular health outcomes was tested and validated. The incidence estimates given by the method were on average within 10% of the values from a cohort study. This method could also capture the trend of incidence if multiple cross-sectional data are available. This method has the potential to be used in population without valid cardiovascular disease incidence statistics.
Canadian Community Health Survey
- CDC Tracking Network:
The national environmental public health tracking network of U.S. Centers for Disease Control and Prevention
National Health and Nutrition Examination Survey
National Population Health Survey
Funding support from the Canada Research Chair Program to HMC is acknowledged. The funding body has no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Availability of data and materials
The NHANES datasets analyzed in the current study are available at http://www.cdc.gov/nchs/nhanes/index.htm
The CCHS and NPHS datasets analyzed in the current study are available in the research data centers of Statistics Canada. Permission from Statistics Canada was required to access the data sets. Detail description of the data are available at:
XH and HMC formed the study design and analytical plan. XH analyzed the data. XH, KY, and HMC interpreted the results and wrote the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Not applicable. This is a method development paper using publicly available data. No human participant was involved.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Coggon D, Barker D, Rose G. Chapter 2. Quantifying disease in populations. In: Coggon D, Barker D, Rose G, editors. Epidemiology for the Uninitiated, 4th Edition. John Wiley & Sons; 2009.
- Ford ES, Roger VL, Dunlay SM, Go AS, Rosamond WD. Challenges of ascertaining national trends in the incidence of coronary heart disease in the United States. J Am Heart Assoc. 2014;3:1–23.Google Scholar
- Kleindorfer D, Panagos P, Pancioli A, Khoury J, Kissela B, Woo D, et al. Incidence and short-term prognosis of transient ischemic attack in a population-based study. Stroke. 2005;36:720–3.View ArticlePubMedGoogle Scholar
- Sigurdsson E, Thorgeirsson G, Sigvaldason H, Sigfusson N. Unrecognized myocardial infarction: Epidemiology, clinical characteristics, and the prognostic role of angina pectoris. Ann Intern Med. 1995;122:96–102.View ArticlePubMedGoogle Scholar
- Go AS, Mozaffarian D, Roger VL, Benjamin EJ, Berry JD, Blaha MJ, et al. Heart disease and stroke statistics--2014 update: a report from the American heart association. Circulation. 2014;129:e28–e292.View ArticlePubMedGoogle Scholar
- Podgor MJ, Leske MC. Estimating incidence from age-specific prevalence for irreversible diseases with differential mortality. Stat Med. 1986;5:573–8.View ArticlePubMedGoogle Scholar
- Hallett TB. Estimating the HIV incidence rate – recent and future developments. Curr Opin HIV AIDS. 2011;6:102–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Williams B, Gouws E, Wilkinson D, Karim SA. Estimating HIV incidence rates from age prevalence data in epidemic situations. Stat Med. 2001;20:2003–16.View ArticlePubMedGoogle Scholar
- Hallett TB, Zaba B, Todd J, Lopman B, Mwita W, Biraro S, et al. Estimating incidence from prevalence in generalised HIV epidemics: methods and validation. PLoS Med. 2008;5:e80.View ArticlePubMedPubMed CentralGoogle Scholar
- Statistics Canada. National Population Health Survey: Household Component, Longitudinal (NPHS) 2012. Available from: http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&Id=75088. Accessed 24 Oct 2015.
- Yeh R, Sidney S, Chandra M, Sorel M, Selby J, Go A. Population trends in the incidence and outcomes of acute myocardial infarction. N Engl J Med. 2010;362:2155–65.View ArticlePubMedGoogle Scholar
- CDC & NCHS. National Health and Nutrition Examination Survey Data. Hyattsville, MD; 2015.
- Vacarino V, Parsons L, Every N, Barron HV, Krumholz HMF, the national registry of myocardial infanction 2 participants. Sex-based differences in early mortality after myocardial infarction. N Engl J Med. 1999;341:217–25.View ArticleGoogle Scholar
- Vacarino V, Krumholz HM, Yarzebski J, Gore JM, Goldberg RJ. Sex differences in 2-year mortality after hospital discharge for myocardial infaction. Ann Intern Med. 2001;134:173–81.View ArticleGoogle Scholar
- CDC. Underlying Cause of Death. Accessed From: http://wonder.cdc.gov/controller/datarequest/D76. Accessed on 2015 Oct 14.
- CDC. Environmental Public Health Tracking Network. Hospitalizations for Heart Attack. Accessed From: www.cdc.gov/ephtracking. Accessed on 2015 Oct 09.
- Béland Y. Canadian community health survey - methodological overview. Heal reports. 2002;13:9–14.Google Scholar
- Kaul P, Armstrong PW, Chang W-C, Naylor CD, Granger CB, Lee KL, et al. Long-term mortality of patients with acute myocardial infarction in the United States and Canada: comparison of patients enrolled in Global Utilization of Streptokinase and t-PA for Occluded Coronary Arteries (GUSTO)-I. Circulation. 2004;110:1754–60.View ArticlePubMedGoogle Scholar
- Wijeysundera HC, Machado M, Farahati F, Wang X, Witteman W, van der Velde G, et al. Association of temporal trends in risk factors and treatment uptake with coronary heart disease mortality, 1994–2005. JAMA. 2010;303:1841–7.View ArticlePubMedGoogle Scholar
- Statistics Canada. Cause of Death 2000–2011. Accessed From: http://www5.statcan.gc.ca/cansim/. Accessed on 2015 Nov 20.
- Tambay JL, Catlin G. Sample design of the national population health survey. Heal reports. 1995;7:29–38. 31–42.Google Scholar
- Parikh NI, Gona P, Larson MG, Fox CS, Benjamin EJ, Murabito JM, et al. Long-term trends in myocardial infarction incidence and case fatality in the National Heart, Lung, and Blood Institute’s Framingham Heart study. Circulation. 2009;119:1203–10.View ArticlePubMedPubMed CentralGoogle Scholar
- Talbott EO, Rager JR, Brink L a L, Benson SM, Bilonick R a., Wu WC, et al. Trends in Acute Myocardial Infarction Hospitalization Rates for US States in the CDC Tracking Network. PLoS One. 2013;88(5):e64457. doi:10.1371/journal.pone.0064457.
- Kannel WB, Abbott RD. Incidence and prognosis of unrecognized myocardial infarction. An update on the Framingham Study. N Engl J Med. 1984;311:1144–7.View ArticlePubMedGoogle Scholar
- Van Der Heijden AAWA, Ortegon MM, Niessen LW, Nijpels G, Dekker JM. Prediction of coronary heart disease risk in a general, pre-diabetic, and diabetic population during 10 years of follow-up: Accuracy of the Framingham, SCORE, and UKPDS risk functions - The Hoorn Study. Diabetes Care. 2009;32:2094–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Centers for Disease Control and Prevention. Indicator: Hospitalizations for Heart Attack. Limitations of the measures. 2012. Available from: http://ephtracking.cdc.gov/showIndicatorPages.action?selectedContentAreaAbbreviation=4&selectedIndicatorId=36&selectedMeasureId
- Chen J, Normand SLT, Wang Y, Drye EE, Schreiner GC, Krumholz HM. Recent declines in hospitalizations for acute myocardial infarction for medicare fee-for-service beneficiaries: Progress and continuing challenges. Circulation. 2010;121:1322–8.View ArticlePubMedGoogle Scholar
- Fang J, Alderman MH, Keenan NL, Ayala C. Acute myocardial infarction hospitalization in the United States, 1979 to 2005. Am J Med. 2010;123:259–66. Elsevier Inc.View ArticlePubMedGoogle Scholar