- Research article
- Open Access
- Open Peer Review
Hierarchies of evidence applied to lifestyle Medicine (HEALM): introduction of a strength-of-evidence approach based on a methodological systematic review
BMC Medical Research Methodology volume 19, Article number: 178 (2019)
Current methods for assessing strength of evidence prioritize the contributions of randomized controlled trials (RCTs). The objective of this study was to characterize strength of evidence (SOE) tools in recent use, identify their application to lifestyle interventions for improved longevity, vitality, or successful aging, and to assess implications of the findings.
The search strategy was created in PubMed and modified as needed for four additional databases: Embase, AnthropologyPlus, PsycINFO, and Ageline, supplemented by manual searching. Systematic reviews and meta-analyses of intervention trials or observational studies relevant to lifestyle intervention were included if they used a specified SOE tool. Data was collected for each SOE tool. Conditions necessary for assigning the highest SOE grading and treatment of prospective cohort studies within each SOE rating framework were summarized. The expert panel convened to discuss the implications of findings for assessing evidence in the domain of lifestyle medicine.
Results and conclusions
A total of 15 unique tools were identified. Ten were tools developed and used by governmental agencies or other equivalent professional bodies and were applicable in a variety of settings. Of these 10, four require consistent results from RCTs of high quality to award the highest rating of evidence. Most SOE tools include prospective cohort studies only to note their secondary contribution to overall SOE as compared to RCTs. We developed a new construct, Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM), to illustrate the feasibility of a tool based on the specific contributions of diverse research methods to understanding lifetime effects of health behaviors. Assessment of evidence relevant to lifestyle medicine requires a potential adaptation of SOE approaches when outcomes and/or exposures obviate exclusive or preferential reliance on RCTs. This systematic review was registered with the International Prospective Register of Systematic Reviews, PROSPERO [CRD42018082148].
There is at present lively debate in the peer-reviewed literature regarding the nature of evidence supporting specific recommendations pertaining to nutrition [1, 2] and other components of lifestyle medicine . Lifestyle medicine can be defined as the use of behavioral modifications in diet, exercise, sleep, stress, or substance use/exposure to prevent, treat, and potentially reverse lifestyle-related, chronic disease . Such modifications may be implemented in clinical settings or more broadly as public health interventions, environmental changes to reinforce healthy default choices, or as online or distance-based interventions, but all with the intent to alter health behaviors among individuals.
Assessment of scientific evidence for a given question has evolved in academic publications from the presentation of an individual author’s conclusions into a formalized process [5,6,7] that involves conducting a systematic review of all available evidence within predetermined inclusion criteria. A common outcome of a systematic review is an assessment of “strength of evidence” (SOE) by the authors, starting with individual assessments of study quality followed by the use of a SOE grading tool to synthesize and summarize findings from all included studies. SOE is then often used to inform the next step in public health and clinical practice, writing practice recommendations, or assessing strength of recommendations [8, 9].
Evaluating SOE for research questions related to health behaviors of individuals is of high importance for public health professionals and clinicians focusing on behavioral modification as part of clinical practice. Interest in lifestyle medicine is rapidly expanding globally . Lifestyle choices can have a major impact on burden of disease and premature death, even if the exact contributions of different components (exercise, diet, smoking, etc.) in the context of total lifestyle pattern are debated. Among the more frequent criticisms of lifestyle medicine is that conclusions and practice recommendations are not adequately informed by randomized controlled trials (RCTs) [11, 12]. Counter-arguments, noting the importance of other sources of evidence, have been published as well, at times in tandem [13, 14]. Thus, the importance of reliably interpreting relevant evidence about lifestyle choices has never been greater .
The majority of current systems for evaluating scientific evidence are well-suited to conventional medical treatment such as pharmacotherapy and discrete procedures. The movement towards evidence-based medicine (EBM) in recent years has emphasized the commonly accepted hierarchy of evidence and generally places results from RCTs above other study designs [16, 17]. While this is appropriate in many instances, RCTs are subject to specific biases and may not serve to address questions concerning the lifetime effects of health behaviors [18, 19].
Specifically, RCTs have methodological limitations that impede application to the investigation of longevity, overall vitality , compression of morbidity , and the lifetime [22,23,24] effects of diet, exercise, stress, sleep habits, and other lifestyle components, as well as ethical considerations depending on the research question. Such limitations have been examined in previous decades  and, more recently, in new publications highlighting the drawbacks of over-reliance on an RCT-centric model . These limitations are particularly relevant in the context of developing healthcare practice guidelines for treatments that can withstand the challenges of real-world applications [16, 25]. Some such limitations of the RCT model include the following:
Cost constraints and challenges with adherence makes it difficult to randomize individuals to lifestyle interventions and maintain the prescribed behaviors for sufficient time periods (decades) to investigate the effects of such exposures on mortality or long-term morbidity [26, 27].
Blinding of the treatment group is only possible when the treatment is ostensibly similar to the placebo. While this is straightforward in drug trials, it is difficult at best, and often impossible when modifying health behaviors.
The generalizability of results in intervention trials to the broader population may be limited.
Some debate exists around differences in results seen between observational studies and RCTs. Depending on the research questions, evidence from observational cohort studies may be substantially more informative in drawing conclusions about overall SOE . There may be a particular advantage in hybridizing evidence sources, recognizing that different evidence sources, from bench research, to intervention studies in humans, to observational epidemiology, make distinct contributions to understanding [17, 29, 30]. Therefore, it would be useful to have a method of evaluating SOE that is tailored to assessing lifestyle interventions and that can offer a more holistic assessment of evidence spanning diverse methods.
We conducted a methodologic systematic review of SOE tools to inform the answer to this question: When RCTs cannot, for whatever reason, serve as the primary evidence source, are there alternative assemblies of evidence that can be used to achieve comparable confidence in a given exposure-outcome relationship?
The research team was convened by the American College of Lifestyle Medicine (ACLM) in joint auspices with the True Health Initiative (THI) to (1) conduct a methodological systematic review of SOE grading tools in recent or current use to characterize which assemblies of evidence produce an evidence rating of highest strength, and (2) analyze the findings and their implications for potentially developing a new grading tool to evaluate SOE in the specific context of lifestyle medicine, where often good RCTs are not available or possible.
The Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement was followed in reporting this systematic review . The protocol was prospectively developed and registered on the International Prospective Register of Systematic Reviews, PROSPERO, [CRD42018082148] [32, 33]. An expert panel (Additional file 1) in evidence-based medicine and its application to nutrition/lifestyle behaviors was convened to assess the findings and make recommendations.
The search strategy was built in PubMed in consultation with a librarian and modified as needed for four additional databases: Embase, AnthropologyPlus, PsycINFO, and Ageline. The databases were searched for studies containing keywords related to either lifestyle or longevity. To identify only SOE tools in recent or current use, searches included studies published during the previous five years from the start of the project, from 01/01/2013–11/07/2017. There were seven exposures of interest related to lifestyle: diet, exercise, stress, social relationships/support, addiction(s), sleep, and genetic-based factors with potential for epigenetic modification. Additional search terms were included to restrict the scope of our literature search to papers related to avoidance of chronic disease: longevity, vitality, and healthy or successful aging. Keywords used in the search strategy are presented in Table 1. Search strategies were restricted to systematic reviews and meta-analyses conducted among humans and published in English, as the research team was not able to read or screen non-English papers. Umbrella reviews (systematic reviews of systematic reviews) were not included. To further focus on evaluation of evidence related to the lifetime effects of health behaviors and healthy aging, PubMed and PsycINFO searches were limited to studies in participants 65+ years of age. The complete search strategy for all five databases is presented in Additional file 2.
To identify relevant SOE tools in current or recent use, we included systematic reviews and meta-analyses of intervention trials or observational studies that both examined lifestyle medicine exposures and outcomes and evaluated SOE using a specified SOE tool. The inclusion and exclusion criteria applied in abstract and full-text screening are presented in Table 2. Included studies were required to contain only studies conducted in human adults and with at least one comparison group. Studies were excluded if they were conducted in children, healthcare workers, animals, or in vitro or if they only included single-arm trials (i.e., no comparison group). Studies were also excluded if they utilized any pharmaceutical- or supplement-based interventions, utilized genome-wide-association-studies (GWAS), or focused on research methods, validation of instruments or questionnaires, medical devices, or other assays. Additionally, given our focus on lifestyle medicine, studies were excluded if they examined research questions not relevant to lifestyle medicine (e.g., focused on the domains of injury severity, effectiveness of diagnostic tools or medical devices, or mechanistic questions that are tangential to lifestyle interventions or that were not clearly modifiable by lifestyle factors).
Study selection process
After merging results from all five databases and removing duplicates, all citations were title-screened by a single investigator [MK] to exclude in vitro, cell and stem cell studies, animal studies, and studies whose designs were clearly not a systematic review or meta-analysis, such as studies that used other study designs in the title (case report, randomized controlled trial, prospective cohort study, etc.) All studies with ambiguous titles were included at this stage of screening. All abstracts identified via the literature searches were then independently double-screened (independently screened by two different investigators) [MK, MSW, AS] using the inclusion and exclusion criteria (Table 2) via the open-source, online software Rayyan . Full-text articles were retrieved for all abstracts deemed potentially relevant. Keyword text mining was performed to identify papers that mentioned text relevant to the use of a SOE tool [MK, MSW]. Full keyword search terms are presented in Additional file 3. Articles containing one or more of the keywords were then independently double-screened based on inclusion and exclusion criteria [MK, MSW]. All abstract and full-text screening conflicts were resolved through group discussion and final decisions reached by group consensus.
Additionally, the results from the systematic search process were complemented with manual searching on websites of major agencies recommended by the expert panel that conduct or commission systematic reviews. Agency websites were searched for officially adopted SOE tools [MK, MSW, AS]. A list of unique tools was compiled from the combination of the systematic and manual searches [MK].
Data extraction forms were created and received approval from the entire research team prior to use. The information extracted included the following: date first published; purpose of the evaluation; intended audience; number of levels of SOE; the definition of the highest level of SOE; and the placement of cohort studies in the framework of SOE. All data extractors initially extracted 10% of the articles to pilot uniformity of extractions. For all remaining articles, each article was extracted by one investigator and reviewed and confirmed by a second [MK, MSW, AS]. Any disagreements were discussed among the research team and resolved via group consensus.
Risk of bias (ROB) in individual studies
As this systematic review’s focus is on SOE grading systems related to lifestyle medicine outcomes and not studies’ specific lifestyle-related findings, ROB assessments were not conducted. However, if ROB assessments played a role in the included SOE grading systems, details were extracted.
Data were summarized in narrative form with regard to the conditions necessary for assigning the highest SOE grading (e.g., for assigning a grade “A” or level “1” rating). Next, the treatment of prospective cohort studies within each SOE rating framework was qualitatively summarized [MK, MSW, AS].
The PRISMA flow diagram for study selection and exclusion is presented in Fig. 1. The manual search guided by the expert panel identified a total of eight unique SOE tools. The systematic search strategy identified a total of 1196 studies. Of these, 267 studies contained one or more relevant keywords. From these, a total of 33 studies mentioned using a specific SOE tool: 23 studies used Grading of Recommendations, Assessment, Development and Evaluation (GRADE) , which had previously been identified in the manual search, and 10 studies used a total of seven other unique SOE tools. Thus, a total of 15 unique tools are presented in Table 3.
All 15 tools rated SOE using three to five levels, with the exception of the US Food and Drug Administration (FDA) tool in reference to qualified health claims  (two levels). Of the 15 tools included, five were lesser-known methods defined by authors and primarily related to pain or physical rehabilitation and treatment [37,38,39,40,41,42,43,44,45,46]. The other 10 SOE tools were developed and used by well-known agencies and are applicable in a variety of settings [35, 36, 47,48,49,50,51,52,53]. Of these 10, four clearly require consistent results from RCTs of high quality to award the highest rating of evidence: GRADE , the FDA tool in reference to health claims for food products , the American College of Cardiology / American Heart Association Task Force on Practice Guidelines Levels of Evidence , and the Evidence-based Practice Center (EPC) method for grading SOE .
Four SOE tools describe more flexibility in the use of study design in determining ratings: the Community Preventive Services Task Force method  references study design and its “suitability for answering the research question;” the Grading System from the Academy of Nutrition and Dietetics  describes “studies of strong design for the question;”, the Johanna Briggs Levels of Evidence identifies different levels of evidence under the separate headings effectiveness, diagnosis, prognosis, economic evaluations, or meaningfulness , and the Oxford Centre for Evidence-Based Medicine (OCEBM) Levels of Evidence  uses a grid of five levels of evidence, where each level is specifically tailored to seven different kinds of research questions and supports a variety of combinations of quantity and quality of evidence depending on the specific research question.
With the exception of the OCEBM Levels of Evidence  specific mention of observational studies was made only in reference to their secondary contribution to overall SOE from RCTs, unless RCTs were methodologically flawed.
Conceptualization of SOE approach specific to lifestyle Medicine
Upon completion of the systematic review, the expert panel convened to discuss the findings. The results confirmed that the following methodological elements within existing SOE tools in recent use are lacking:
Criteria to evaluate exposure-outcome relationships examined over years/decades/lifetimes
Criteria to evaluate behaviors/exposures used in lifestyle medicine that may not allow for randomization or blinding (e.g., smoking, long-term dietary patterns, etc.)
Guidance to synthesize findings from diverse study designs, except to prioritize RCTs over observational studies.
To address these issues, the Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM) investigators enumerated the particular contributions of diverse research methods into a complete understanding of exposure/treatment effects, as shown in Table 4.
Based on this simple framework, a new method for selecting the criteria by which SOE can be assessed was developed, titled Evidence Threshold Pathway Mapping (Table 5). It is intended to formalize and make explicit the decision process of which method or tool to use to evaluate SOE. With strength defined operationally as the relevant “threshold” value for some level of confidence, this potential methodologic innovation offers an opportunity to identify the assemblies of evidence that are most appropriate for a given research question, such as change in intermediate risk factors, short-term alleviation of disease symptoms, long-term improvement in diagnosed disease, or long-term prevention. The basic propositions underlying Evidence Threshold Pathway Mapping are that (a) different methods of research are best suited for making different yet complementary contributions to the overall weight of relevant evidence, and (b) different assemblies of evidence can produce the same aggregate strength or confidence. We recognize that in the absence of RCT data for treatment effects, certainty about treatment effects from other types of evidence may be more limited; thus, there is a basis to weight the contributions of RCTs preferentially. However, other types of evidence may still offer a spectrum of certainty or additional context for understanding.
Also implicit in this approach is the contention that various research methods serve different objectives related to evidence about a causal pathway. Bench science and animal model studies are most often used to establish clear and decisive evidence of mechanisms but cannot establish in vivo effects in humans . Controlled intervention studies, and most notably RCTs, are used to establish attribution with confidence, while minimizing bias and controlling for both known and unknown confounders . However, RCTs are not always ethically or practically feasible and they are demanding to implement at the population level, or over time periods relevant to lifetime vitality . They also can introduce sampling bias that may greatly limit generalizability or external validity. Observational epidemiology, notably prospective cohort studies and even ethnographic studies, can readily assess associations at scale and over extended time periods (decades), but these are subject to bias including sampling bias, residual confounding, and they lack the capacity of RCTs to assign attribution with clarity .
Accordingly, evidence is strongest when the unique contributions of these diverse methods are synthesized. Making conclusions by drawing from a diversity of evidence sources can potentially allow for confidence in study design methods from one type of research, confidence in attribution from another type, confidence in effects at scale from yet another, and confidence in effects over extended timelines from another still. This amalgamation of complementary evidence is especially important when research questions cannot be readily answered by one study design alone (e.g., What dietary pattern produces the best health outcomes over a lifetime?) . Such considerations are a subject of active discussion in nutrition research [56, 57].
Thus, we introduce a new construct- Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM) shown in Table 6, to illustrate means of assessing SOE in future systematic reviews within the domain of lifestyle medicine when the use of GRADE or another SOE tool is not appropriate.
HEALM incorporates the variety of sources of evidence available and synthesizes their contributions into one rating. It is important to note that the method described in Tables 5-6 suggests one specific framework for handling a set of considerations around SOE. Alternative ways of handling such considerations including using a conventionally defined tool such as GRADE, not utilizing a predetermined scoring system, or uniquely adapting an existing tool to the research question being asked. We introduce Evidence Threshold Pathway Mapping and HEALM to illustrate one example of a suitable, customized approach for researchers in lifestyle medicine that can be applied, tested, and validated in practice. The proposed approach for evaluating SOE is informed by the flexibility and specificity presented in OCEBM . HEALM adapts this approach to the specific exigencies of lifestyle medicine, while placing an emphasis on the alignment of research methods with specific questions related to causal pathways. To identify when use of such a tool might be appropriate, we suggest employing Evidence Threshold Pathway Mapping (Table 5) to map the pathway for evidence evaluation along the branches of a simple decision tree. For example, this process produces a suggestion to use the HEALM tool for all research questions concerning lifetime cumulative effects of specific health behaviors, as lifetime effects cannot be assessed in < 5 years. However, it suggests using GRADE  for other questions that are feasibly answered with RCTs.
Lifestyle behaviors are among the leading determinants of health outcomes, with non-communicable disease causing nearly three-quarters of death globally . Dietary patterns have recently risen to the very top of this list , and there is intense debate about the strength and reliability of pertinent evidence [1,2,3]. The majority of current systems for evaluating scientific evidence are well-suited to evaluating pharmaceutical approaches to managing disease, but currently a system for evaluating SOE particular to lifestyle medicine does not exist.
Assessment of SOE requires grading the methodological quality and ROB of individual included studies, assessing the consistency and internal validity of studies addressing a specific research question, and forming conclusion statements. Such SOE conclusions can thus inform the discussion on the weight of evidence, informed by multiple studies providing for external validity or generalizability to various populations, settings, and circumstances.
Evidence Threshold Pathway Mapping contends that the same level of confidence, and the same strength of evidence, can be achieved by a variety of assemblies of evidence. The approach respects the unique value of RCTs in establishing attribution and does not assume RCTs are interchangeable with other study designs. Rather, Evidence Threshold Pathway Mapping acknowledges that RCTs may be precluded for various reasons with regard to a given outcome and that other complementary evidence should be considered. Even then, such trials may contribute to understanding by assessing attribution with use of interim measures, and/or surrogate markers. This method of identifying the SOE approach used for evaluation based on the nature of the question being asked is informed by the approach taken in the OCEBM tool , which tailors SOE evaluation for different types of research questions.
HEALM, derived from application of the Evidence Threshold Pathway Mapping approach, is one unique, potential approach organized to frame discussion of existing evidence available to answer specific research questions relevant to lifestyle medicine when existing tools such as GRADE are not viable options (i.e., the question is not fully addressable through RCTs). The scoring, similar to other SOE tools, relies on expert consensus, but is also informed by quantitative scoring considerations used in umbrella reviews  to evaluate results from multiple meta-analyses. While grading SOE does not necessarily mean meta-analyses will always be conducted, a quantitative framework to guide discussion will lead to greater consistency of results. HEALM defines categorical levels of SOE, as is conventionally done when evaluating evidence. However, it should be noted that such categories are derived from a continuum of SOE and that the value of the categories is to increase the utility of the tool for communicating findings. The intended purpose of HEALM is to evaluate SOE, which can then be used to develop strength of recommendation-based practice statements. The construct first introduced here may gain traction as is; it may be revised and refined by others; or it may be replaced outright if an alternative metric serving the same goals performs better.
The need for innovation in SOE assessment is in part because the RCT holds a position of relative primacy in the adjudication of medical evidence. Arguments favoring reliance on RCTs rightly invoke the merits in this methodology, namely defense against diverse kinds of bias, and protection against confounders both known and unknown  thus prioritizing internal validity. There are, however, diverse and valid concerns with the limitations of RCTs  in achieving external validity.
Also of concern are the cases in which observational and intervention trial results appear to be in conflict with one another. In some cases, RCTs may be testing different hypotheses than observational studies, and conclusions from one investigation may not be generalizable to all populations. For example, a review analysis on the use of hormone replacement therapy (HRT) among women in the Women’s Health Initiative affirms the consistency of findings across observational and intervention data if the age at time of starting HRT is considered .
A recent Cochrane systematic review concluded such differences are likely not due to differences in study design alone; rather, RCTs and observational studies tend to produce similar effect sizes for a range of health outcomes and disagreements are likely due to other study characteristics  such as testing different hypotheses  or duration of follow-up. While there are examples of RCTs that document outcomes after several years of follow-up post- intervention [62, 63], the challenges of adherence  severely limit feasibility of continuous interventions over decades. To the authors’ knowledge, there are no RCTs that have successfully and continuously implemented an intervention, especially one with a potentially small effect size, for the decades necessary to test “lifetime” effects. Thus, the prevailing impression that results from RCTs are consistently superior may be exaggerated, with the benefits and risks of hormone replacement therapy providing an example of the partial contributions to understanding made possible by both RCTs and observational cohort studies [64,65,66,67].
In contrast, there are clear cases in which observational studies offer a superior method of evaluating questions concerning the cumulative, lifetime effects of lifestyle practices. A key example of such trials whose recruitment is designed to maximize the number of endpoints is the Alpha-Tocopherol, Beta-carotene Lung Cancer Prevention (ATBC) Study which targeted male smokers . In capturing hard endpoints such as cancer and cancer-related mortality, short-term RCTs would be of insufficient duration to see the outcome of interest, as well as being impossible to implement with exposures like smoking for ethical reasons.
The HEALM tool scores evidence, lending particular weight to RCTs for the clarification of causal effects and attribution. The tool, however, allows for rating evidence as strong even if RCT data are not more than suggestive, provided evidence from all other complementary research approaches are decisive and aligned. More importantly, short-term evidence from RCTs, or focus on isolated biomarkers, absent any suitably long-term data addressing hard outcomes would not score as “strong” in the realm of lifestyle medicine because of the great potential divide between short and long-term effects. As an example, many serious infectious diseases lower weight and blood lipids; such “favorable” trends in biomarkers are obviously not indicative of beneficial health effects in the long term. This adaptation of established approaches readily accommodates the imperative of judging the impact of lifestyle practices on health outcomes over the full human life span.
The strength of this study was to take an approach of a methodological systematic review to capture existing and recently used SOE tools, thus ensuring that a new method proposed would offer a novel contribution to address current methodological gaps. Limitations of this study included the focus in the search strategy on healthy aging as an outcome, rather than risk for specific chronic diseases. The search strategy was constructed in alignment with the target outcomes of lifestyle medicine practice (healthy aging, as opposed to chronic disease), and inclusion of all major chronic disease outcomes would not have been practical due to the large number of search results. Additionally, the search strategy was limited to systematic reviews of studies conducted among those ≥65 years, not because lifestyle medicine is only relevant for older populations, but because this focused the search strategy to identify studies in the domain of longevity. SOE tools used in these contexts would be potential best matches for evaluating evidence concerning other lifestyle medicine-type questions. However, manual searching for SOE tools based on expert panel recommendations augmented the systematic review results to the degree that all major tools known by the expert panel are included in our results.
Finally, the HEALM construct is dependent on conclusions about the “plurality” of evidence from distinct research methods. Other than results produced from systematic review of meta-analysis, there is no universal standard for sufficient or sufficiently consistent evidence to establish the veracity of a given causal pathway or weight of the evidence for a given research topic. Even meta-analyses and systematic reviews fail to reach this standard, because in “crowded” research domains more than one such study is common and they may conflict with one another. The Community Preventive Services Task Force (CPSTF)  provides some guidance on assigning strength of recommendations based on SOE conclusions by suggesting that inconsistent evidence should lead to separate recommendations for specific populations, and that no conclusions should be reached in the case of conflicting evidence. However, this guidance does not provide a framework for synthesizing strength and weight of evidence more broadly. Further, a limitation of HEALM is that it utilizes categories to assign relative levels of confidence, though this limitation is common to existing SOE tools.
The problem of establishing an operational definition for the “weight of evidence,” or a decisive plurality of studies, is in no way specific to lifestyle medicine. This is a generic challenge pertaining to all assessments of overall evidence, and thus deemed beyond the scope of this particular effort. This group simply notes the importance of this issue, and its pertinence to both Evidence Threshold Pathway Mapping and HEALM. This paper invites attention to the matter and highlights the opportunity to fortify operational definitions in this area.
This project was commissioned with a preferential focus on lifestyle medicine, but the implications apply broadly to public health. Lifestyle practices and exposures- dietary patterns, physical activity patterns, sleep patterns, tobacco and alcohol exposures, psychological stressors, social connections- while uniquely emphasized in lifestyle medicine (4), pertain to all fields of medicine and public health and to all health professionals.
Future research should test application of Evidence Threshold Pathway Mapping and HEALM by conducting systematic reviews on specific research questions in the domain of lifestyle medicine. The HEALM construct should evolve, informed by research in which it is applied.
SOE tools in current use are generally poorly suited to long-term effects of lifestyle choices such as diet, exercise, sleep, and stress. Evidence Threshold Pathway Mapping, a method for identifying multiple assemblies of evidence to achieve a given grade, extends the robust assessment of evidence to a wider array of questions important to medicine and public health. HEALM is proposed as one example of a tool specifically adapted to questions in lifestyle medicine and nutrition. Application, testing, and validation of the performance of HEALM and consideration of its relevance to this domain of medicine are encouraged.
Availability of data and materials
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study. All data used in this systematic review are accessible via the papers referenced in the manuscript.
American College of Lifestyle Medicine
Alpha-Tocopherol, Beta-carotene Lung Cancer Prevention
Community Preventive Services Task Force
Evidence-based Practice Center
Food and Drug Administration
Grading of Recommendations, Assessment, Development and Evaluation
Hierarchies of Evidence Applied to Lifestyle Medicine
hormone replacement therapy
Oxford Centre for Evidence-Based Medicine
Preferred Reporting Items for Systematic Reviews and Meta-analyses
randomized controlled trial
risk of bias
strength of evidence
True Health Initiative
Ioannidis JP, Trepanowski JF. Disclosures in nutrition research: why it is different. JAMA. 2018;319(6):547–8.
Laville M, Segrestin B, Alligier M, Ruano-Rodríguez C, Serra-Majem L, Hiesmayr M, et al. Evidence-based practice within nutrition: what are the barriers for improving the evidence and how can they be dealt with? Trials. 2017;18(1):425.
Jørgensen T, Jacobsen RK, Toft U, Aadahl M, Glümer C, Pisinger C. Effect of screening and lifestyle counselling on incidence of ischaemic heart disease in general population: Inter99 randomised trial. BMJ. 2014;348:g3617.
Katz DLFE, Medicine FMDL. Maxcy-Rosenau-last public health and preventive Medicine. 16th Ed. In: Production: January; 2019.
Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. Grading quality of evidence and strength of recommendations. BMJ (Clinical research ed). 2004;328(7454):1490.
Owens DK, Lohr KN, Atkins D, Treadwell JR, Reston JT, Bass EB, et al. AHRQ series paper 5: grading the strength of a body of evidence when comparing medical interventions—Agency for Healthcare Research and Quality and the effective health-care program. J Clin Epidemiol. 2010;63(5):513–23.
Guyatt GH, Oxman AD, Schünemann HJ, Tugwell P, Knottnerus A. GRADE guidelines: a new series of articles in the journal of clinical epidemiology. J Clin Epidemiol. 2011;64(4):380–2.
Guyatt GH, Oxman AD, Kunz R, Vist GE, Falck-Ytter Y, Schünemann HJ. Rating quality of evidence and strength of recommendations: what is “quality of evidence” and why is it important to clinicians? BMJ: British medical journal. 2008;336(7651):995.
Guyatt GH, Oxman AD, Kunz R, Falck-Ytter Y, Vist GE, Liberati A, et al. Rating quality of evidence and strength of recommendations: going from evidence to recommendations. BMJ: British Medical Journal. 2008;336(7652):1049.
Kushner RF, Sorensen KW. Lifestyle medicine: the future of chronic disease management. Curr Opin Endocrinol Diabetes Obes. 2013;20(5):389–95.
Ioannidis JP. We need more randomized trials in nutrition—preferably large, long-term, and with negative results. Oxford University Press. 2016.
Rosen L, Manor O, Engelhard D, Zucker D. In defense of the randomized controlled trial for health promotion research. Am J Public Health. 2006;96(7):1181–6.
Willett WC. Diet and health—finding a path to Veritas. Eur J Epidemiol. 2018;33(2):127–35.
Barnard ND, Willett WC, Ding EL. The misuse of meta-analysis in nutrition research. JAMA. 2017;318(15):1435–6.
Mokdad AH, Ballestros K, Echko M, Glenn S, Olsen HE, Mullany E, et al. The state of US health, 1990-2016: burden of diseases, injuries, and risk factors among US states. JAMA. 2018;319(14):1444–72.
Horwitz RI, Hayes-Conroy A, Caricchio R, Singer BH. From evidence based Medicine to Medicine based evidence. Am J Med. 2017.
Hannan EL. Randomized clinical trials and observational studies. guidelines for assessing respective strengths and limitations JACC Cardiovasc Interv. 2008;1(3):211–7.
Feinstein AR, Horwitz RI. Problems in the “evidence” of “evidence-based medicine”. Am J Med. 1997;103(6):529–35.
Frieden TR. Evidence for health decision making - beyond randomized, controlled trials. N Engl J Med. 2017;377(5):465–75.
Guerin E. Disentangling vitality, well-being, and quality of life: a conceptual examination emphasizing their similarities and differences with special application in the physical activity domain. J Phys Act Health. 2012;9(6):896–908.
Fries J, Green L, Levine S. Health promotion and the compression of morbidity. Lancet. 1989;333(8636):481–3.
Charlton BM, Rich-Edwards JW, Colditz GA, Missmer SA, Rosner BA, Hankinson SE, et al. Oral contraceptive use and mortality after 36 years of follow-up in the Nurses' health study: prospective cohort study. BMJ. 2014;349:g6356.
Elliot AJ, Mooney CJ, Infurna FJ, Chapman BP. Associations of lifetime trauma and chronic stress with C-reactive protein in adults ages 50 years and older: examining the moderating role of perceived control. Psychosom Med. 2017;79(6):622–30.
Reinikainen J, Laatikainen T, Karvanen J, Tolonen H. Lifetime cumulative risk factors predict cardiovascular disease mortality in a 50-year follow-up study in Finland. Int J Epidemiol. 2015;44(1):108–16.
Ioannidis JP. Some main problems eroding the credibility and relevance of randomized trials. Bull NYU Hosp Jt Dis. 2008;66(2):135–9.
Seidelmann SB, Claggett B, Cheng S, Henglin M, Shah A, Steffen LM, et al. Dietary carbohydrate intake and mortality: a prospective cohort study and meta-analysis. Lancet Public Health. 2018;3(9):e419–e28.
Crichton GE, Howe PR, Buckley JD, Coates AM, Murphy KJ, Bryan J. Long-term dietary intervention trials: critical issues and challenges. Trials. 2012;13:111.
The alpha-tocopherol, beta-carotene lung cancer prevention study: design, methods, participant characteristics, and compliance. The ATBC Cancer Prevention Study Group. Ann Epidemiol. 1994;4(1):1–10.
Nakamura R. Animal models and basic science—bench to bedside: session introduction. ILAR J. 2011;52(Suppl_1):493.
Sanson-Fisher RW, Bonevski B, Green LW, D'Este C. Limitations of the randomized controlled trial in evaluating population-based health interventions. Am J Prev Med. 2007;33(2):155–61.
Moher D, Liberati A, Tetzlaff J, Altman DG. Group P. preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med. 2009;151(4):264–9, W64.
PROSPERO - International prospective register of systematic reviews [Available from: https://www.crd.york.ac.uk/prospero/].
David Katz, Jonathan Fielding, Lawrence Green, Ralph Horwitz, John Ioannidis, Walter Willett, Mei Chung, Micaela Karlsen, Marissa Shams-White, Ayumi Saito, Deena Wang. Hierarchies of evidence applied to lifestyle Medicine (HEaLM): a methodological systematic review of evidence grading tools to inform development of the best method to assess strength of evidence for lifestyle medicine interventions and related clinical outcomes, including longevity and healthy aging. PROSPERO 2018 CRD42018082148 available from: https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=82148.
Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan-a web and mobile app for systematic reviews. Syst Rev. 2016;5(1):210.
Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. Grading quality of evidence and strength of recommendations. BMJ. 2004;328(7454):1490.
Food, Administration D. Guidance for industry and FDA: interim evidence-based ranking system for scientific data. July 10, 2003.
Sallis JF, Prochaska JJ, Taylor WC. A review of correlates of physical activity of children and adolescents. Med Sci Sports Exerc. 2000;32(5):963–75.
Bully P, Sanchez A, Zabaleta-del-Olmo E, Pombo H, Grandes G. Evidence from interventions based on theoretical models for lifestyle modification (physical activity, diet, alcohol and tobacco use) in primary care settings: a systematic review. Prev Med. 2015;76 Suppl:S76–93.
van Poppel MN, Koes BW, Smid T, Bouter LM. A systematic review of controlled clinical trials on the prevention of back pain in industry. Occup Environ Med. 1997;54(12):841–7.
Slavin RE. Best evidence synthesis: an intelligent alternative to meta-analysis. J Clin Epidemiol. 1995;48(1):9–18.
Mansi S, Milosavljevic S, Baxter GD, Tumilty S, Hendrick P. A systematic review of studies using pedometers as an intervention for musculoskeletal diseases. BMC Musculoskelet Disord. 2014;15(1):231.
Geraedts H, Zijlstra A, Bulstra SK, Stevens M, Zijlstra W. Effects of remote feedback in home-based physical activity interventions for older adults: a systematic review. Patient Educ Couns. 2013;91(1):14–24.
Singh A, Uijtdewilligen L, Twisk JW, Van Mechelen W, Chinapaw MJ. Physical activity and performance at school: a systematic review of the literature including a methodological quality assessment. Arch Pediatr Adolesc Med. 2012;166(1):49–55.
Peurala SH, Karttunen AH, Sjögren T, Paltamaa J, Heinonen A. Evidence for the effectiveness of walking training on walking and self-care after stroke: a systematic review and meta-analysis of randomized controlled trials. J Rehabil Med. 2014;46(5):387–99.
Stuck AE, Walthert JM, Nikolaus T, Bula CJ, Hohmann C, Beck JC. Risk factors for functional status decline in community-living elderly people: a systematic literature review. Soc Sci Med. 1999;48(4):445–69.
van der Vorst A, Zijlstra GR, De Witte N, Duppen D, Stuck AE, Kempen GI, et al. Limitations in activities of daily living in community-dwelling people aged 75 and over: a systematic literature review of risk and protective factors. PLoS One. 2016;11(10):e0165127.
Briss PA, Zaza S, Pappaioanou M, Fielding J, Wright-De Aguero L, Truman BI, et al. Developing an evidence-based guide to community preventive services--methods. The task force on community preventive services. Am J Prev Med. 2000;18(1 Suppl):35–43.
Moyer V, Bibbins-Domingo K. The US preventive services task force: what is it and what does it do? N C Med J. 2015;76(4):238–42.
Scientific Report of the 2015 Dietary Guidelines Advisory Committee. 2015.
Handu D, Moloney L, Wolfram T, Ziegler P, Acosta A, Steiber A. Academy of nutrition and dietetics methodology for conducting systematic reviews for the evidence analysis library. J Acad Nutr Diet. 2016;116(2):311–8.
Berkman ND, Lohr KN, Ansari MT, Balk EM, Kane R, McDonagh M, et al. Grading the strength of a body of evidence when assessing health care interventions: an EPC update. J Clin Epidemiol. 2015;68(11):1312–24.
Institute JB. New JBI levels of Evidence 2013. Available from: https://www.google.com/url?client=internal-uds-cse&cx=007368958558683417275:2imkdaua2-c&q=https://joannabriggs.org/sites/default/files/2019-05/JBI%2520Levels%2520of%2520Evidence%2520Supporting%2520Documents-v2.pdf&sa=U&ved=2ahUKEwiWppTP2P3jAhVSHqwKHdlRDRgQFjACegQIDxAB&usg=AOvVaw0bFx9zTocvgNC_em53nIBf.
Oxford Center for Evidence-Based Medicine Levels of Evidence. Accessed online May Afhwcno-l-o-e.
Guidelines ACoCAHATFoP. Methodology Manual and Policies From the ACCF/AHA Task Force on Practice Guidelines2010 May 2018. Available from: http://professional.heart.org/professional/GuidelinesStatements/PublicationDevelopment/UCM_320470_Methodologies-and-Policies-from-the-ACCAHA-Task-Force-on-Practice-Guidelines.jsp.
Katz DL, Meller S. Can we say what diet is best for health? Annu Rev Public Health. 2014;35:83–103.
Blake P, Durao S, Naude CE, Bero L. An analysis of methods used to synthesize evidence and grade recommendations in food-based dietary guidelines. Nutr Rev. 2018;76(4):290–300.
National Academies of Sciences E, Medicine. Guiding principles for developing dietary reference intakes based on chronic disease: National Academies Press; 2017.
Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392(10159):1736–1788.
Belbasis L, Köhler C, Stefanis N, Stubbs B, van Os J, Vieta E, et al. Risk factors and peripheral biomarkers for schizophrenia spectrum disorders: an umbrella review of meta-analyses. Acta Psychiatr Scand. 2018;137(2):88–97.
Manson JE, Chlebowski RT, Stefanick ML, Aragaki AK, Rossouw JE, Prentice RL, et al. Menopausal hormone therapy and health outcomes during the intervention and extended poststopping phases of the Women’s Health Initiative randomized trials. JAMA. 2013;310(13):1353–68.
Anglemyer A, Horvath HT, Bero L. Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database Syst Rev. (2014, 4):MR000034.
Estruch R, Ros E, Salas-Salvado J, Covas MI, Corella D, Aros F, et al. Primary prevention of cardiovascular disease with a Mediterranean diet supplemented with extra-virgin olive oil or nuts. N Engl J Med. 2018;378(25):e34.
Goldberg RB, Bray GA, Marcovina SM, Mather KJ, Orchard TJ, Perreault L, et al. Non-traditional biomarkers and incident diabetes in the diabetes prevention program: comparative effects of lifestyle and metformin interventions. Diabetologia. 2018.
Sarrel PM, Njike VY, Vinante V, Katz DL. The mortality toll of estrogen avoidance: an analysis of excess deaths among hysterectomized women aged 50 to 59 years. Am J Public Health. 2013;103(9):1583–8.
Prentice RL, Langer R, Stefanick ML, Howard BV, Pettinger M, Anderson G, et al. Combined postmenopausal hormone therapy and cardiovascular disease: toward resolving the discrepancy between observational studies and the Women's Health Initiative clinical trial. Am J Epidemiol. 2005;162(5):404–14.
Hodis HN, Sarrel PM. Menopausal hormone therapy and breast cancer: what is the evidence from randomized trials? Climacteric : the journal of the International Menopause Society. 2018;21(6):521–8.
Silverman SL. From randomized controlled trials to observational studies. Am J Med. 2009;122(2):114–20.
The research team thanks Amy Lapidow at the Tufts University Hirsh Health Sciences Library for her assistance with the search strategy along with the members of the HEALM Expert Panel: Mei Chung, Lawrence Green, Jonathan Fielding, and Walter Willett.
The views expressed are those of the authors and do not reflect an official position of the National Cancer Institute or National Institutes of Health.
Funding was provided by the American College of Lifestyle Medicine (MCK, MMSW). Expert panel members (LWG, JF, MC, and WW) received honorarium from the American College of Lifestyle Medicine for their contributions to this work. Additional funding was provided by the Centers for Disease Control, grant 5U48DP005023–04 (DK). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Ethics approval and consent to participate
Consent for publication
MC was a consultant to the National Academy of Sciences, Engineering, and Medicine’s Dietary Reference Intakes panel on sodium and potassium and also has received funding from Agency for Healthcare Research and Quality to conduct systematic reviews but declares no competing interests. All other authors declare that no competing interests exist.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Katz, D.L., Karlsen, M.C., Chung, M. et al. Hierarchies of evidence applied to lifestyle Medicine (HEALM): introduction of a strength-of-evidence approach based on a methodological systematic review. BMC Med Res Methodol 19, 178 (2019) doi:10.1186/s12874-019-0811-z
- Strength of evidence
- Systematic review
- Lifestyle medicine
- Lifetime effects