Skip to main content
  • Research article
  • Open access
  • Published:

Hierarchies of evidence applied to lifestyle Medicine (HEALM): introduction of a strength-of-evidence approach based on a methodological systematic review



Current methods for assessing strength of evidence prioritize the contributions of randomized controlled trials (RCTs). The objective of this study was to characterize strength of evidence (SOE) tools in recent use, identify their application to lifestyle interventions for improved longevity, vitality, or successful aging, and to assess implications of the findings.


The search strategy was created in PubMed and modified as needed for four additional databases: Embase, AnthropologyPlus, PsycINFO, and Ageline, supplemented by manual searching. Systematic reviews and meta-analyses of intervention trials or observational studies relevant to lifestyle intervention were included if they used a specified SOE tool. Data was collected for each SOE tool. Conditions necessary for assigning the highest SOE grading and treatment of prospective cohort studies within each SOE rating framework were summarized. The expert panel convened to discuss the implications of findings for assessing evidence in the domain of lifestyle medicine.

Results and conclusions

A total of 15 unique tools were identified. Ten were tools developed and used by governmental agencies or other equivalent professional bodies and were applicable in a variety of settings. Of these 10, four require consistent results from RCTs of high quality to award the highest rating of evidence. Most SOE tools include prospective cohort studies only to note their secondary contribution to overall SOE as compared to RCTs. We developed a new construct, Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM), to illustrate the feasibility of a tool based on the specific contributions of diverse research methods to understanding lifetime effects of health behaviors. Assessment of evidence relevant to lifestyle medicine requires a potential adaptation of SOE approaches when outcomes and/or exposures obviate exclusive or preferential reliance on RCTs. This systematic review was registered with the International Prospective Register of Systematic Reviews, PROSPERO [CRD42018082148].

Peer Review reports


There is at present lively debate in the peer-reviewed literature regarding the nature of evidence supporting specific recommendations pertaining to nutrition [1, 2] and other components of lifestyle medicine [3]. Lifestyle medicine can be defined as the use of behavioral modifications in diet, exercise, sleep, stress, or substance use/exposure to prevent, treat, and potentially reverse lifestyle-related, chronic disease [4]. Such modifications may be implemented in clinical settings or more broadly as public health interventions, environmental changes to reinforce healthy default choices, or as online or distance-based interventions, but all with the intent to alter health behaviors among individuals.

Assessment of scientific evidence for a given question has evolved in academic publications from the presentation of an individual author’s conclusions into a formalized process [5,6,7] that involves conducting a systematic review of all available evidence within predetermined inclusion criteria. A common outcome of a systematic review is an assessment of “strength of evidence” (SOE) by the authors, starting with individual assessments of study quality followed by the use of a SOE grading tool to synthesize and summarize findings from all included studies. SOE is then often used to inform the next step in public health and clinical practice, writing practice recommendations, or assessing strength of recommendations [8, 9].

Evaluating SOE for research questions related to health behaviors of individuals is of high importance for public health professionals and clinicians focusing on behavioral modification as part of clinical practice. Interest in lifestyle medicine is rapidly expanding globally [10]. Lifestyle choices can have a major impact on burden of disease and premature death, even if the exact contributions of different components (exercise, diet, smoking, etc.) in the context of total lifestyle pattern are debated. Among the more frequent criticisms of lifestyle medicine is that conclusions and practice recommendations are not adequately informed by randomized controlled trials (RCTs) [11, 12]. Counter-arguments, noting the importance of other sources of evidence, have been published as well, at times in tandem [13, 14]. Thus, the importance of reliably interpreting relevant evidence about lifestyle choices has never been greater [15].

The majority of current systems for evaluating scientific evidence are well-suited to conventional medical treatment such as pharmacotherapy and discrete procedures. The movement towards evidence-based medicine (EBM) in recent years has emphasized the commonly accepted hierarchy of evidence and generally places results from RCTs above other study designs [16, 17]. While this is appropriate in many instances, RCTs are subject to specific biases and may not serve to address questions concerning the lifetime effects of health behaviors [18, 19].

Specifically, RCTs have methodological limitations that impede application to the investigation of longevity, overall vitality [20], compression of morbidity [21], and the lifetime [22,23,24] effects of diet, exercise, stress, sleep habits, and other lifestyle components, as well as ethical considerations depending on the research question. Such limitations have been examined in previous decades [18] and, more recently, in new publications highlighting the drawbacks of over-reliance on an RCT-centric model [19]. These limitations are particularly relevant in the context of developing healthcare practice guidelines for treatments that can withstand the challenges of real-world applications [16, 25]. Some such limitations of the RCT model include the following:

  1. 1.

    Cost constraints and challenges with adherence makes it difficult to randomize individuals to lifestyle interventions and maintain the prescribed behaviors for sufficient time periods (decades) to investigate the effects of such exposures on mortality or long-term morbidity [26, 27].

  2. 2.

    Blinding of the treatment group is only possible when the treatment is ostensibly similar to the placebo. While this is straightforward in drug trials, it is difficult at best, and often impossible when modifying health behaviors.

  3. 3.

    The generalizability of results in intervention trials to the broader population may be limited.

Some debate exists around differences in results seen between observational studies and RCTs. Depending on the research questions, evidence from observational cohort studies may be substantially more informative in drawing conclusions about overall SOE [28]. There may be a particular advantage in hybridizing evidence sources, recognizing that different evidence sources, from bench research, to intervention studies in humans, to observational epidemiology, make distinct contributions to understanding [17, 29, 30]. Therefore, it would be useful to have a method of evaluating SOE that is tailored to assessing lifestyle interventions and that can offer a more holistic assessment of evidence spanning diverse methods.

We conducted a methodologic systematic review of SOE tools to inform the answer to this question: When RCTs cannot, for whatever reason, serve as the primary evidence source, are there alternative assemblies of evidence that can be used to achieve comparable confidence in a given exposure-outcome relationship?

The research team was convened by the American College of Lifestyle Medicine (ACLM) in joint auspices with the True Health Initiative (THI) to (1) conduct a methodological systematic review of SOE grading tools in recent or current use to characterize which assemblies of evidence produce an evidence rating of highest strength, and (2) analyze the findings and their implications for potentially developing a new grading tool to evaluate SOE in the specific context of lifestyle medicine, where often good RCTs are not available or possible.


The Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement was followed in reporting this systematic review [31]. The protocol was prospectively developed and registered on the International Prospective Register of Systematic Reviews, PROSPERO, [CRD42018082148] [32, 33]. An expert panel (Additional file 1) in evidence-based medicine and its application to nutrition/lifestyle behaviors was convened to assess the findings and make recommendations.

Search strategy

The search strategy was built in PubMed in consultation with a librarian and modified as needed for four additional databases: Embase, AnthropologyPlus, PsycINFO, and Ageline. The databases were searched for studies containing keywords related to either lifestyle or longevity. To identify only SOE tools in recent or current use, searches included studies published during the previous five years from the start of the project, from 01/01/2013–11/07/2017. There were seven exposures of interest related to lifestyle: diet, exercise, stress, social relationships/support, addiction(s), sleep, and genetic-based factors with potential for epigenetic modification. Additional search terms were included to restrict the scope of our literature search to papers related to avoidance of chronic disease: longevity, vitality, and healthy or successful aging. Keywords used in the search strategy are presented in Table 1. Search strategies were restricted to systematic reviews and meta-analyses conducted among humans and published in English, as the research team was not able to read or screen non-English papers. Umbrella reviews (systematic reviews of systematic reviews) were not included. To further focus on evaluation of evidence related to the lifetime effects of health behaviors and healthy aging, PubMed and PsycINFO searches were limited to studies in participants 65+ years of age. The complete search strategy for all five databases is presented in Additional file 2.

Table 1 Keywords used in the search strategy to identify systematic reviews and meta-analyses using relevant strength of evidence (SOE) tools

Inclusion/exclusion criteria

To identify relevant SOE tools in current or recent use, we included systematic reviews and meta-analyses of intervention trials or observational studies that both examined lifestyle medicine exposures and outcomes and evaluated SOE using a specified SOE tool. The inclusion and exclusion criteria applied in abstract and full-text screening are presented in Table 2. Included studies were required to contain only studies conducted in human adults and with at least one comparison group. Studies were excluded if they were conducted in children, healthcare workers, animals, or in vitro or if they only included single-arm trials (i.e., no comparison group). Studies were also excluded if they utilized any pharmaceutical- or supplement-based interventions, utilized genome-wide-association-studies (GWAS), or focused on research methods, validation of instruments or questionnaires, medical devices, or other assays. Additionally, given our focus on lifestyle medicine, studies were excluded if they examined research questions not relevant to lifestyle medicine (e.g., focused on the domains of injury severity, effectiveness of diagnostic tools or medical devices, or mechanistic questions that are tangential to lifestyle interventions or that were not clearly modifiable by lifestyle factors).

Table 2 Inclusion/exclusion criteria applied in abstract and full-text screening1

Study selection process

After merging results from all five databases and removing duplicates, all citations were title-screened by a single investigator [MK] to exclude in vitro, cell and stem cell studies, animal studies, and studies whose designs were clearly not a systematic review or meta-analysis, such as studies that used other study designs in the title (case report, randomized controlled trial, prospective cohort study, etc.) All studies with ambiguous titles were included at this stage of screening. All abstracts identified via the literature searches were then independently double-screened (independently screened by two different investigators) [MK, MSW, AS] using the inclusion and exclusion criteria (Table 2) via the open-source, online software Rayyan [34]. Full-text articles were retrieved for all abstracts deemed potentially relevant. Keyword text mining was performed to identify papers that mentioned text relevant to the use of a SOE tool [MK, MSW]. Full keyword search terms are presented in Additional file 3. Articles containing one or more of the keywords were then independently double-screened based on inclusion and exclusion criteria [MK, MSW]. All abstract and full-text screening conflicts were resolved through group discussion and final decisions reached by group consensus.

Additionally, the results from the systematic search process were complemented with manual searching on websites of major agencies recommended by the expert panel that conduct or commission systematic reviews. Agency websites were searched for officially adopted SOE tools [MK, MSW, AS]. A list of unique tools was compiled from the combination of the systematic and manual searches [MK].

Data extraction

Data extraction forms were created and received approval from the entire research team prior to use. The information extracted included the following: date first published; purpose of the evaluation; intended audience; number of levels of SOE; the definition of the highest level of SOE; and the placement of cohort studies in the framework of SOE. All data extractors initially extracted 10% of the articles to pilot uniformity of extractions. For all remaining articles, each article was extracted by one investigator and reviewed and confirmed by a second [MK, MSW, AS]. Any disagreements were discussed among the research team and resolved via group consensus.

Risk of bias (ROB) in individual studies

As this systematic review’s focus is on SOE grading systems related to lifestyle medicine outcomes and not studies’ specific lifestyle-related findings, ROB assessments were not conducted. However, if ROB assessments played a role in the included SOE grading systems, details were extracted.

Data synthesis

Data were summarized in narrative form with regard to the conditions necessary for assigning the highest SOE grading (e.g., for assigning a grade “A” or level “1” rating). Next, the treatment of prospective cohort studies within each SOE rating framework was qualitatively summarized [MK, MSW, AS].


The PRISMA flow diagram for study selection and exclusion is presented in Fig. 1. The manual search guided by the expert panel identified a total of eight unique SOE tools. The systematic search strategy identified a total of 1196 studies. Of these, 267 studies contained one or more relevant keywords. From these, a total of 33 studies mentioned using a specific SOE tool: 23 studies used Grading of Recommendations, Assessment, Development and Evaluation (GRADE) [35], which had previously been identified in the manual search, and 10 studies used a total of seven other unique SOE tools. Thus, a total of 15 unique tools are presented in Table 3.

Fig 1.
figure 1

PRISMA flow diagram

All 15 tools rated SOE using three to five levels, with the exception of the US Food and Drug Administration (FDA) tool in reference to qualified health claims [36] (two levels). Of the 15 tools included, five were lesser-known methods defined by authors and primarily related to pain or physical rehabilitation and treatment [37,38,39,40,41,42,43,44,45,46]. The other 10 SOE tools were developed and used by well-known agencies and are applicable in a variety of settings [35, 36, 47,48,49,50,51,52,53]. Of these 10, four clearly require consistent results from RCTs of high quality to award the highest rating of evidence: GRADE [35], the FDA tool in reference to health claims for food products [36], the American College of Cardiology / American Heart Association Task Force on Practice Guidelines Levels of Evidence [54], and the Evidence-based Practice Center (EPC) method for grading SOE [51].

Four SOE tools describe more flexibility in the use of study design in determining ratings: the Community Preventive Services Task Force method [47] references study design and its “suitability for answering the research question;” the Grading System from the Academy of Nutrition and Dietetics [50] describes “studies of strong design for the question;”, the Johanna Briggs Levels of Evidence identifies different levels of evidence under the separate headings effectiveness, diagnosis, prognosis, economic evaluations, or meaningfulness [52], and the Oxford Centre for Evidence-Based Medicine (OCEBM) Levels of Evidence [53] uses a grid of five levels of evidence, where each level is specifically tailored to seven different kinds of research questions and supports a variety of combinations of quantity and quality of evidence depending on the specific research question.

With the exception of the OCEBM Levels of Evidence [53] specific mention of observational studies was made only in reference to their secondary contribution to overall SOE from RCTs, unless RCTs were methodologically flawed.

Table 3 Strength-of-evidence (SOE) rating tools

Conceptualization of SOE approach specific to lifestyle Medicine

Upon completion of the systematic review, the expert panel convened to discuss the findings. The results confirmed that the following methodological elements within existing SOE tools in recent use are lacking:

  1. 1.

    Criteria to evaluate exposure-outcome relationships examined over years/decades/lifetimes

  2. 2.

    Criteria to evaluate behaviors/exposures used in lifestyle medicine that may not allow for randomization or blinding (e.g., smoking, long-term dietary patterns, etc.)

  3. 3.

    Guidance to synthesize findings from diverse study designs, except to prioritize RCTs over observational studies.

To address these issues, the Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM) investigators enumerated the particular contributions of diverse research methods into a complete understanding of exposure/treatment effects, as shown in Table 4.

Table 4 Contributions of evidence from the major categories of research approaches

Based on this simple framework, a new method for selecting the criteria by which SOE can be assessed was developed, titled Evidence Threshold Pathway Mapping (Table 5). It is intended to formalize and make explicit the decision process of which method or tool to use to evaluate SOE. With strength defined operationally as the relevant “threshold” value for some level of confidence, this potential methodologic innovation offers an opportunity to identify the assemblies of evidence that are most appropriate for a given research question, such as change in intermediate risk factors, short-term alleviation of disease symptoms, long-term improvement in diagnosed disease, or long-term prevention. The basic propositions underlying Evidence Threshold Pathway Mapping are that (a) different methods of research are best suited for making different yet complementary contributions to the overall weight of relevant evidence, and (b) different assemblies of evidence can produce the same aggregate strength or confidence. We recognize that in the absence of RCT data for treatment effects, certainty about treatment effects from other types of evidence may be more limited; thus, there is a basis to weight the contributions of RCTs preferentially. However, other types of evidence may still offer a spectrum of certainty or additional context for understanding.

Table 5 Evidence Threshold Pathway Mapping

Also implicit in this approach is the contention that various research methods serve different objectives related to evidence about a causal pathway. Bench science and animal model studies are most often used to establish clear and decisive evidence of mechanisms but cannot establish in vivo effects in humans [29]. Controlled intervention studies, and most notably RCTs, are used to establish attribution with confidence, while minimizing bias and controlling for both known and unknown confounders [17]. However, RCTs are not always ethically or practically feasible and they are demanding to implement at the population level, or over time periods relevant to lifetime vitality [30]. They also can introduce sampling bias that may greatly limit generalizability or external validity. Observational epidemiology, notably prospective cohort studies and even ethnographic studies, can readily assess associations at scale and over extended time periods (decades), but these are subject to bias including sampling bias, residual confounding, and they lack the capacity of RCTs to assign attribution with clarity [30].

Accordingly, evidence is strongest when the unique contributions of these diverse methods are synthesized. Making conclusions by drawing from a diversity of evidence sources can potentially allow for confidence in study design methods from one type of research, confidence in attribution from another type, confidence in effects at scale from yet another, and confidence in effects over extended timelines from another still. This amalgamation of complementary evidence is especially important when research questions cannot be readily answered by one study design alone (e.g., What dietary pattern produces the best health outcomes over a lifetime?) [55]. Such considerations are a subject of active discussion in nutrition research [56, 57].

Thus, we introduce a new construct- Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM) shown in Table 6, to illustrate means of assessing SOE in future systematic reviews within the domain of lifestyle medicine when the use of GRADE or another SOE tool is not appropriate.

Table 6 Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM) Strength of Evidence (SOE) Approach*

HEALM incorporates the variety of sources of evidence available and synthesizes their contributions into one rating. It is important to note that the method described in Tables 5-6 suggests one specific framework for handling a set of considerations around SOE. Alternative ways of handling such considerations including using a conventionally defined tool such as GRADE, not utilizing a predetermined scoring system, or uniquely adapting an existing tool to the research question being asked. We introduce Evidence Threshold Pathway Mapping and HEALM to illustrate one example of a suitable, customized approach for researchers in lifestyle medicine that can be applied, tested, and validated in practice. The proposed approach for evaluating SOE is informed by the flexibility and specificity presented in OCEBM [53]. HEALM adapts this approach to the specific exigencies of lifestyle medicine, while placing an emphasis on the alignment of research methods with specific questions related to causal pathways. To identify when use of such a tool might be appropriate, we suggest employing Evidence Threshold Pathway Mapping (Table 5) to map the pathway for evidence evaluation along the branches of a simple decision tree. For example, this process produces a suggestion to use the HEALM tool for all research questions concerning lifetime cumulative effects of specific health behaviors, as lifetime effects cannot be assessed in < 5 years. However, it suggests using GRADE [35] for other questions that are feasibly answered with RCTs.


Lifestyle behaviors are among the leading determinants of health outcomes, with non-communicable disease causing nearly three-quarters of death globally [58]. Dietary patterns have recently risen to the very top of this list [15], and there is intense debate about the strength and reliability of pertinent evidence [1,2,3]. The majority of current systems for evaluating scientific evidence are well-suited to evaluating pharmaceutical approaches to managing disease, but currently a system for evaluating SOE particular to lifestyle medicine does not exist.

Assessment of SOE requires grading the methodological quality and ROB of individual included studies, assessing the consistency and internal validity of studies addressing a specific research question, and forming conclusion statements. Such SOE conclusions can thus inform the discussion on the weight of evidence, informed by multiple studies providing for external validity or generalizability to various populations, settings, and circumstances.

Evidence Threshold Pathway Mapping contends that the same level of confidence, and the same strength of evidence, can be achieved by a variety of assemblies of evidence. The approach respects the unique value of RCTs in establishing attribution and does not assume RCTs are interchangeable with other study designs. Rather, Evidence Threshold Pathway Mapping acknowledges that RCTs may be precluded for various reasons with regard to a given outcome and that other complementary evidence should be considered. Even then, such trials may contribute to understanding by assessing attribution with use of interim measures, and/or surrogate markers. This method of identifying the SOE approach used for evaluation based on the nature of the question being asked is informed by the approach taken in the OCEBM tool [53], which tailors SOE evaluation for different types of research questions.

HEALM, derived from application of the Evidence Threshold Pathway Mapping approach, is one unique, potential approach organized to frame discussion of existing evidence available to answer specific research questions relevant to lifestyle medicine when existing tools such as GRADE are not viable options (i.e., the question is not fully addressable through RCTs). The scoring, similar to other SOE tools, relies on expert consensus, but is also informed by quantitative scoring considerations used in umbrella reviews [59] to evaluate results from multiple meta-analyses. While grading SOE does not necessarily mean meta-analyses will always be conducted, a quantitative framework to guide discussion will lead to greater consistency of results. HEALM defines categorical levels of SOE, as is conventionally done when evaluating evidence. However, it should be noted that such categories are derived from a continuum of SOE and that the value of the categories is to increase the utility of the tool for communicating findings. The intended purpose of HEALM is to evaluate SOE, which can then be used to develop strength of recommendation-based practice statements. The construct first introduced here may gain traction as is; it may be revised and refined by others; or it may be replaced outright if an alternative metric serving the same goals performs better.

The need for innovation in SOE assessment is in part because the RCT holds a position of relative primacy in the adjudication of medical evidence. Arguments favoring reliance on RCTs rightly invoke the merits in this methodology, namely defense against diverse kinds of bias, and protection against confounders both known and unknown [17] thus prioritizing internal validity. There are, however, diverse and valid concerns with the limitations of RCTs [30] in achieving external validity.

Also of concern are the cases in which observational and intervention trial results appear to be in conflict with one another. In some cases, RCTs may be testing different hypotheses than observational studies, and conclusions from one investigation may not be generalizable to all populations. For example, a review analysis on the use of hormone replacement therapy (HRT) among women in the Women’s Health Initiative affirms the consistency of findings across observational and intervention data if the age at time of starting HRT is considered [60].

A recent Cochrane systematic review concluded such differences are likely not due to differences in study design alone; rather, RCTs and observational studies tend to produce similar effect sizes for a range of health outcomes and disagreements are likely due to other study characteristics [61] such as testing different hypotheses [60] or duration of follow-up. While there are examples of RCTs that document outcomes after several years of follow-up post- intervention [62, 63], the challenges of adherence [27] severely limit feasibility of continuous interventions over decades. To the authors’ knowledge, there are no RCTs that have successfully and continuously implemented an intervention, especially one with a potentially small effect size, for the decades necessary to test “lifetime” effects. Thus, the prevailing impression that results from RCTs are consistently superior may be exaggerated, with the benefits and risks of hormone replacement therapy providing an example of the partial contributions to understanding made possible by both RCTs and observational cohort studies [64,65,66,67].

In contrast, there are clear cases in which observational studies offer a superior method of evaluating questions concerning the cumulative, lifetime effects of lifestyle practices. A key example of such trials whose recruitment is designed to maximize the number of endpoints is the Alpha-Tocopherol, Beta-carotene Lung Cancer Prevention (ATBC) Study which targeted male smokers [28]. In capturing hard endpoints such as cancer and cancer-related mortality, short-term RCTs would be of insufficient duration to see the outcome of interest, as well as being impossible to implement with exposures like smoking for ethical reasons.

The HEALM tool scores evidence, lending particular weight to RCTs for the clarification of causal effects and attribution. The tool, however, allows for rating evidence as strong even if RCT data are not more than suggestive, provided evidence from all other complementary research approaches are decisive and aligned. More importantly, short-term evidence from RCTs, or focus on isolated biomarkers, absent any suitably long-term data addressing hard outcomes would not score as “strong” in the realm of lifestyle medicine because of the great potential divide between short and long-term effects. As an example, many serious infectious diseases lower weight and blood lipids; such “favorable” trends in biomarkers are obviously not indicative of beneficial health effects in the long term. This adaptation of established approaches readily accommodates the imperative of judging the impact of lifestyle practices on health outcomes over the full human life span.

The strength of this study was to take an approach of a methodological systematic review to capture existing and recently used SOE tools, thus ensuring that a new method proposed would offer a novel contribution to address current methodological gaps. Limitations of this study included the focus in the search strategy on healthy aging as an outcome, rather than risk for specific chronic diseases. The search strategy was constructed in alignment with the target outcomes of lifestyle medicine practice (healthy aging, as opposed to chronic disease), and inclusion of all major chronic disease outcomes would not have been practical due to the large number of search results. Additionally, the search strategy was limited to systematic reviews of studies conducted among those ≥65 years, not because lifestyle medicine is only relevant for older populations, but because this focused the search strategy to identify studies in the domain of longevity. SOE tools used in these contexts would be potential best matches for evaluating evidence concerning other lifestyle medicine-type questions. However, manual searching for SOE tools based on expert panel recommendations augmented the systematic review results to the degree that all major tools known by the expert panel are included in our results.

Finally, the HEALM construct is dependent on conclusions about the “plurality” of evidence from distinct research methods. Other than results produced from systematic review of meta-analysis, there is no universal standard for sufficient or sufficiently consistent evidence to establish the veracity of a given causal pathway or weight of the evidence for a given research topic. Even meta-analyses and systematic reviews fail to reach this standard, because in “crowded” research domains more than one such study is common and they may conflict with one another. The Community Preventive Services Task Force (CPSTF) [47] provides some guidance on assigning strength of recommendations based on SOE conclusions by suggesting that inconsistent evidence should lead to separate recommendations for specific populations, and that no conclusions should be reached in the case of conflicting evidence. However, this guidance does not provide a framework for synthesizing strength and weight of evidence more broadly. Further, a limitation of HEALM is that it utilizes categories to assign relative levels of confidence, though this limitation is common to existing SOE tools.

The problem of establishing an operational definition for the “weight of evidence,” or a decisive plurality of studies, is in no way specific to lifestyle medicine. This is a generic challenge pertaining to all assessments of overall evidence, and thus deemed beyond the scope of this particular effort. This group simply notes the importance of this issue, and its pertinence to both Evidence Threshold Pathway Mapping and HEALM. This paper invites attention to the matter and highlights the opportunity to fortify operational definitions in this area.

This project was commissioned with a preferential focus on lifestyle medicine, but the implications apply broadly to public health. Lifestyle practices and exposures- dietary patterns, physical activity patterns, sleep patterns, tobacco and alcohol exposures, psychological stressors, social connections- while uniquely emphasized in lifestyle medicine (4), pertain to all fields of medicine and public health and to all health professionals.

Future research should test application of Evidence Threshold Pathway Mapping and HEALM by conducting systematic reviews on specific research questions in the domain of lifestyle medicine. The HEALM construct should evolve, informed by research in which it is applied.


SOE tools in current use are generally poorly suited to long-term effects of lifestyle choices such as diet, exercise, sleep, and stress. Evidence Threshold Pathway Mapping, a method for identifying multiple assemblies of evidence to achieve a given grade, extends the robust assessment of evidence to a wider array of questions important to medicine and public health. HEALM is proposed as one example of a tool specifically adapted to questions in lifestyle medicine and nutrition. Application, testing, and validation of the performance of HEALM and consideration of its relevance to this domain of medicine are encouraged.

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study. All data used in this systematic review are accessible via the papers referenced in the manuscript.



American College of Lifestyle Medicine


Alpha-Tocopherol, Beta-carotene Lung Cancer Prevention


Community Preventive Services Task Force


evidence-based medicine


Evidence-based Practice Center


Food and Drug Administration


Grading of Recommendations, Assessment, Development and Evaluation




Hierarchies of Evidence Applied to Lifestyle Medicine


hormone replacement therapy


Oxford Centre for Evidence-Based Medicine


Preferred Reporting Items for Systematic Reviews and Meta-analyses


randomized controlled trial


risk of bias


strength of evidence


True Health Initiative


  1. Ioannidis JP, Trepanowski JF. Disclosures in nutrition research: why it is different. JAMA. 2018;319(6):547–8.

    Article  PubMed  Google Scholar 

  2. Laville M, Segrestin B, Alligier M, Ruano-Rodríguez C, Serra-Majem L, Hiesmayr M, et al. Evidence-based practice within nutrition: what are the barriers for improving the evidence and how can they be dealt with? Trials. 2017;18(1):425.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Jørgensen T, Jacobsen RK, Toft U, Aadahl M, Glümer C, Pisinger C. Effect of screening and lifestyle counselling on incidence of ischaemic heart disease in general population: Inter99 randomised trial. BMJ. 2014;348:g3617.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Katz DLFE, Medicine FMDL. Maxcy-Rosenau-last public health and preventive Medicine. 16th Ed. In: Production: January; 2019.

    Google Scholar 

  5. Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. Grading quality of evidence and strength of recommendations. BMJ (Clinical research ed). 2004;328(7454):1490.

    Article  Google Scholar 

  6. Owens DK, Lohr KN, Atkins D, Treadwell JR, Reston JT, Bass EB, et al. AHRQ series paper 5: grading the strength of a body of evidence when comparing medical interventions—Agency for Healthcare Research and Quality and the effective health-care program. J Clin Epidemiol. 2010;63(5):513–23.

    Article  PubMed  Google Scholar 

  7. Guyatt GH, Oxman AD, Schünemann HJ, Tugwell P, Knottnerus A. GRADE guidelines: a new series of articles in the journal of clinical epidemiology. J Clin Epidemiol. 2011;64(4):380–2.

    Article  PubMed  Google Scholar 

  8. Guyatt GH, Oxman AD, Kunz R, Vist GE, Falck-Ytter Y, Schünemann HJ. Rating quality of evidence and strength of recommendations: what is “quality of evidence” and why is it important to clinicians? BMJ: British medical journal. 2008;336(7651):995.

    Article  PubMed  Google Scholar 

  9. Guyatt GH, Oxman AD, Kunz R, Falck-Ytter Y, Vist GE, Liberati A, et al. Rating quality of evidence and strength of recommendations: going from evidence to recommendations. BMJ: British Medical Journal. 2008;336(7652):1049.

    Article  PubMed  Google Scholar 

  10. Kushner RF, Sorensen KW. Lifestyle medicine: the future of chronic disease management. Curr Opin Endocrinol Diabetes Obes. 2013;20(5):389–95.

    Article  PubMed  Google Scholar 

  11. Ioannidis JP. We need more randomized trials in nutrition—preferably large, long-term, and with negative results. Oxford University Press. 2016.

  12. Rosen L, Manor O, Engelhard D, Zucker D. In defense of the randomized controlled trial for health promotion research. Am J Public Health. 2006;96(7):1181–6.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Willett WC. Diet and health—finding a path to Veritas. Eur J Epidemiol. 2018;33(2):127–35.

    Article  PubMed  Google Scholar 

  14. Barnard ND, Willett WC, Ding EL. The misuse of meta-analysis in nutrition research. JAMA. 2017;318(15):1435–6.

    Article  PubMed  Google Scholar 

  15. Mokdad AH, Ballestros K, Echko M, Glenn S, Olsen HE, Mullany E, et al. The state of US health, 1990-2016: burden of diseases, injuries, and risk factors among US states. JAMA. 2018;319(14):1444–72.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Horwitz RI, Hayes-Conroy A, Caricchio R, Singer BH. From evidence based Medicine to Medicine based evidence. Am J Med. 2017.

  17. Hannan EL. Randomized clinical trials and observational studies. guidelines for assessing respective strengths and limitations JACC Cardiovasc Interv. 2008;1(3):211–7.

    Article  PubMed  Google Scholar 

  18. Feinstein AR, Horwitz RI. Problems in the “evidence” of “evidence-based medicine”. Am J Med. 1997;103(6):529–35.

    Article  CAS  PubMed  Google Scholar 

  19. Frieden TR. Evidence for health decision making - beyond randomized, controlled trials. N Engl J Med. 2017;377(5):465–75.

    Article  PubMed  Google Scholar 

  20. Guerin E. Disentangling vitality, well-being, and quality of life: a conceptual examination emphasizing their similarities and differences with special application in the physical activity domain. J Phys Act Health. 2012;9(6):896–908.

    Article  PubMed  Google Scholar 

  21. Fries J, Green L, Levine S. Health promotion and the compression of morbidity. Lancet. 1989;333(8636):481–3.

    Article  Google Scholar 

  22. Charlton BM, Rich-Edwards JW, Colditz GA, Missmer SA, Rosner BA, Hankinson SE, et al. Oral contraceptive use and mortality after 36 years of follow-up in the Nurses' health study: prospective cohort study. BMJ. 2014;349:g6356.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Elliot AJ, Mooney CJ, Infurna FJ, Chapman BP. Associations of lifetime trauma and chronic stress with C-reactive protein in adults ages 50 years and older: examining the moderating role of perceived control. Psychosom Med. 2017;79(6):622–30.

    Article  CAS  PubMed  Google Scholar 

  24. Reinikainen J, Laatikainen T, Karvanen J, Tolonen H. Lifetime cumulative risk factors predict cardiovascular disease mortality in a 50-year follow-up study in Finland. Int J Epidemiol. 2015;44(1):108–16.

    Article  PubMed  Google Scholar 

  25. Ioannidis JP. Some main problems eroding the credibility and relevance of randomized trials. Bull NYU Hosp Jt Dis. 2008;66(2):135–9.

    PubMed  Google Scholar 

  26. Seidelmann SB, Claggett B, Cheng S, Henglin M, Shah A, Steffen LM, et al. Dietary carbohydrate intake and mortality: a prospective cohort study and meta-analysis. Lancet Public Health. 2018;3(9):e419–e28.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Crichton GE, Howe PR, Buckley JD, Coates AM, Murphy KJ, Bryan J. Long-term dietary intervention trials: critical issues and challenges. Trials. 2012;13:111.

    Article  PubMed  PubMed Central  Google Scholar 

  28. The alpha-tocopherol, beta-carotene lung cancer prevention study: design, methods, participant characteristics, and compliance. The ATBC Cancer Prevention Study Group. Ann Epidemiol. 1994;4(1):1–10.

  29. Nakamura R. Animal models and basic science—bench to bedside: session introduction. ILAR J. 2011;52(Suppl_1):493.

    Google Scholar 

  30. Sanson-Fisher RW, Bonevski B, Green LW, D'Este C. Limitations of the randomized controlled trial in evaluating population-based health interventions. Am J Prev Med. 2007;33(2):155–61.

    Article  PubMed  Google Scholar 

  31. Moher D, Liberati A, Tetzlaff J, Altman DG. Group P. preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med. 2009;151(4):264–9, W64.

    Article  PubMed  Google Scholar 

  32. PROSPERO - International prospective register of systematic reviews [Available from:].

  33. David Katz, Jonathan Fielding, Lawrence Green, Ralph Horwitz, John Ioannidis, Walter Willett, Mei Chung, Micaela Karlsen, Marissa Shams-White, Ayumi Saito, Deena Wang. Hierarchies of evidence applied to lifestyle Medicine (HEaLM): a methodological systematic review of evidence grading tools to inform development of the best method to assess strength of evidence for lifestyle medicine interventions and related clinical outcomes, including longevity and healthy aging. PROSPERO 2018 CRD42018082148 available from:

  34. Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan-a web and mobile app for systematic reviews. Syst Rev. 2016;5(1):210.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. Grading quality of evidence and strength of recommendations. BMJ. 2004;328(7454):1490.

    Article  PubMed  Google Scholar 

  36. Food, Administration D. Guidance for industry and FDA: interim evidence-based ranking system for scientific data. July 10, 2003.

  37. Sallis JF, Prochaska JJ, Taylor WC. A review of correlates of physical activity of children and adolescents. Med Sci Sports Exerc. 2000;32(5):963–75.

    Article  CAS  PubMed  Google Scholar 

  38. Bully P, Sanchez A, Zabaleta-del-Olmo E, Pombo H, Grandes G. Evidence from interventions based on theoretical models for lifestyle modification (physical activity, diet, alcohol and tobacco use) in primary care settings: a systematic review. Prev Med. 2015;76 Suppl:S76–93.

    Article  PubMed  Google Scholar 

  39. van Poppel MN, Koes BW, Smid T, Bouter LM. A systematic review of controlled clinical trials on the prevention of back pain in industry. Occup Environ Med. 1997;54(12):841–7.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Slavin RE. Best evidence synthesis: an intelligent alternative to meta-analysis. J Clin Epidemiol. 1995;48(1):9–18.

    Article  CAS  PubMed  Google Scholar 

  41. Mansi S, Milosavljevic S, Baxter GD, Tumilty S, Hendrick P. A systematic review of studies using pedometers as an intervention for musculoskeletal diseases. BMC Musculoskelet Disord. 2014;15(1):231.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Geraedts H, Zijlstra A, Bulstra SK, Stevens M, Zijlstra W. Effects of remote feedback in home-based physical activity interventions for older adults: a systematic review. Patient Educ Couns. 2013;91(1):14–24.

    Article  PubMed  Google Scholar 

  43. Singh A, Uijtdewilligen L, Twisk JW, Van Mechelen W, Chinapaw MJ. Physical activity and performance at school: a systematic review of the literature including a methodological quality assessment. Arch Pediatr Adolesc Med. 2012;166(1):49–55.

    Article  PubMed  Google Scholar 

  44. Peurala SH, Karttunen AH, Sjögren T, Paltamaa J, Heinonen A. Evidence for the effectiveness of walking training on walking and self-care after stroke: a systematic review and meta-analysis of randomized controlled trials. J Rehabil Med. 2014;46(5):387–99.

    Article  PubMed  Google Scholar 

  45. Stuck AE, Walthert JM, Nikolaus T, Bula CJ, Hohmann C, Beck JC. Risk factors for functional status decline in community-living elderly people: a systematic literature review. Soc Sci Med. 1999;48(4):445–69.

    Article  CAS  PubMed  Google Scholar 

  46. van der Vorst A, Zijlstra GR, De Witte N, Duppen D, Stuck AE, Kempen GI, et al. Limitations in activities of daily living in community-dwelling people aged 75 and over: a systematic literature review of risk and protective factors. PLoS One. 2016;11(10):e0165127.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Briss PA, Zaza S, Pappaioanou M, Fielding J, Wright-De Aguero L, Truman BI, et al. Developing an evidence-based guide to community preventive services--methods. The task force on community preventive services. Am J Prev Med. 2000;18(1 Suppl):35–43.

    Article  CAS  PubMed  Google Scholar 

  48. Moyer V, Bibbins-Domingo K. The US preventive services task force: what is it and what does it do? N C Med J. 2015;76(4):238–42.

    PubMed  Google Scholar 

  49. Scientific Report of the 2015 Dietary Guidelines Advisory Committee. 2015.

  50. Handu D, Moloney L, Wolfram T, Ziegler P, Acosta A, Steiber A. Academy of nutrition and dietetics methodology for conducting systematic reviews for the evidence analysis library. J Acad Nutr Diet. 2016;116(2):311–8.

    Article  PubMed  Google Scholar 

  51. Berkman ND, Lohr KN, Ansari MT, Balk EM, Kane R, McDonagh M, et al. Grading the strength of a body of evidence when assessing health care interventions: an EPC update. J Clin Epidemiol. 2015;68(11):1312–24.

    Article  PubMed  Google Scholar 

  52. Institute JB. New JBI levels of Evidence 2013. Available from:

  53. Oxford Center for Evidence-Based Medicine Levels of Evidence. Accessed online May Afhwcno-l-o-e.

  54. Guidelines ACoCAHATFoP. Methodology Manual and Policies From the ACCF/AHA Task Force on Practice Guidelines2010 May 2018. Available from:

  55. Katz DL, Meller S. Can we say what diet is best for health? Annu Rev Public Health. 2014;35:83–103.

    Article  CAS  PubMed  Google Scholar 

  56. Blake P, Durao S, Naude CE, Bero L. An analysis of methods used to synthesize evidence and grade recommendations in food-based dietary guidelines. Nutr Rev. 2018;76(4):290–300.

    Article  PubMed  PubMed Central  Google Scholar 

  57. National Academies of Sciences E, Medicine. Guiding principles for developing dietary reference intakes based on chronic disease: National Academies Press; 2017.

  58. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392(10159):1736–1788.

  59. Belbasis L, Köhler C, Stefanis N, Stubbs B, van Os J, Vieta E, et al. Risk factors and peripheral biomarkers for schizophrenia spectrum disorders: an umbrella review of meta-analyses. Acta Psychiatr Scand. 2018;137(2):88–97.

    Article  CAS  PubMed  Google Scholar 

  60. Manson JE, Chlebowski RT, Stefanick ML, Aragaki AK, Rossouw JE, Prentice RL, et al. Menopausal hormone therapy and health outcomes during the intervention and extended poststopping phases of the Women’s Health Initiative randomized trials. JAMA. 2013;310(13):1353–68.

    Article  CAS  PubMed  Google Scholar 

  61. Anglemyer A, Horvath HT, Bero L. Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database Syst Rev. (2014, 4):MR000034.

  62. Estruch R, Ros E, Salas-Salvado J, Covas MI, Corella D, Aros F, et al. Primary prevention of cardiovascular disease with a Mediterranean diet supplemented with extra-virgin olive oil or nuts. N Engl J Med. 2018;378(25):e34.

    Article  CAS  PubMed  Google Scholar 

  63. Goldberg RB, Bray GA, Marcovina SM, Mather KJ, Orchard TJ, Perreault L, et al. Non-traditional biomarkers and incident diabetes in the diabetes prevention program: comparative effects of lifestyle and metformin interventions. Diabetologia. 2018.

  64. Sarrel PM, Njike VY, Vinante V, Katz DL. The mortality toll of estrogen avoidance: an analysis of excess deaths among hysterectomized women aged 50 to 59 years. Am J Public Health. 2013;103(9):1583–8.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Prentice RL, Langer R, Stefanick ML, Howard BV, Pettinger M, Anderson G, et al. Combined postmenopausal hormone therapy and cardiovascular disease: toward resolving the discrepancy between observational studies and the Women's Health Initiative clinical trial. Am J Epidemiol. 2005;162(5):404–14.

    Article  PubMed  Google Scholar 

  66. Hodis HN, Sarrel PM. Menopausal hormone therapy and breast cancer: what is the evidence from randomized trials? Climacteric : the journal of the International Menopause Society. 2018;21(6):521–8.

    Article  CAS  Google Scholar 

  67. Silverman SL. From randomized controlled trials to observational studies. Am J Med. 2009;122(2):114–20.

    Article  PubMed  Google Scholar 

Download references


The research team thanks Amy Lapidow at the Tufts University Hirsh Health Sciences Library for her assistance with the search strategy along with the members of the HEALM Expert Panel: Mei Chung, Lawrence Green, Jonathan Fielding, and Walter Willett.


The views expressed are those of the authors and do not reflect an official position of the National Cancer Institute or National Institutes of Health.


Funding was provided by the American College of Lifestyle Medicine (MCK, MMSW). Expert panel members (LWG, JF, MC, and WW) received honorarium from the American College of Lifestyle Medicine for their contributions to this work. Additional funding was provided by the Centers for Disease Control, grant 5U48DP005023–04 (DK). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations



DK and WW conceived of the research question. MK, MSW, and DK developed the study design. MK, MSW, and AS performed the systematic review, with supervision from MC. DK, MC, LG, JL, and WW interpreted the data and formed conclusions. All authors contributed to the writing and editing of the manuscript.

Corresponding author

Correspondence to M. C. Karlsen.

Ethics declarations

Ethics approval and consent to participate


Consent for publication


Competing interests

MC was a consultant to the National Academy of Sciences, Engineering, and Medicine’s Dietary Reference Intakes panel on sodium and potassium and also has received funding from Agency for Healthcare Research and Quality to conduct systematic reviews but declares no competing interests. All other authors declare that no competing interests exist.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Expert panel (DOCX 13 kb)

Additional file 2:

Search strategy (DOCX 17 kb)

Additional file 3:

Keywords used in text mining (DOCX 13 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Katz, D.L., Karlsen, M.C., Chung, M. et al. Hierarchies of evidence applied to lifestyle Medicine (HEALM): introduction of a strength-of-evidence approach based on a methodological systematic review. BMC Med Res Methodol 19, 178 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: