Hierarchies of evidence applied to lifestyle Medicine (HEALM): introduction of a strength-of-evidence approach based on a methodological systematic review

Background Current methods for assessing strength of evidence prioritize the contributions of randomized controlled trials (RCTs). The objective of this study was to characterize strength of evidence (SOE) tools in recent use, identify their application to lifestyle interventions for improved longevity, vitality, or successful aging, and to assess implications of the findings. Methods The search strategy was created in PubMed and modified as needed for four additional databases: Embase, AnthropologyPlus, PsycINFO, and Ageline, supplemented by manual searching. Systematic reviews and meta-analyses of intervention trials or observational studies relevant to lifestyle intervention were included if they used a specified SOE tool. Data was collected for each SOE tool. Conditions necessary for assigning the highest SOE grading and treatment of prospective cohort studies within each SOE rating framework were summarized. The expert panel convened to discuss the implications of findings for assessing evidence in the domain of lifestyle medicine. Results and conclusions A total of 15 unique tools were identified. Ten were tools developed and used by governmental agencies or other equivalent professional bodies and were applicable in a variety of settings. Of these 10, four require consistent results from RCTs of high quality to award the highest rating of evidence. Most SOE tools include prospective cohort studies only to note their secondary contribution to overall SOE as compared to RCTs. We developed a new construct, Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM), to illustrate the feasibility of a tool based on the specific contributions of diverse research methods to understanding lifetime effects of health behaviors. Assessment of evidence relevant to lifestyle medicine requires a potential adaptation of SOE approaches when outcomes and/or exposures obviate exclusive or preferential reliance on RCTs. This systematic review was registered with the International Prospective Register of Systematic Reviews, PROSPERO [CRD42018082148]. Electronic supplementary material The online version of this article (10.1186/s12874-019-0811-z) contains supplementary material, which is available to authorized users.


Background
There is at present lively debate in the peer-reviewed literature regarding the nature of evidence supporting specific recommendations pertaining to nutrition [1,2] and other components of lifestyle medicine [3]. Lifestyle medicine can be defined as the use of behavioral modifications in diet, exercise, sleep, stress, or substance use/ exposure to prevent, treat, and potentially reverse lifestyle-related, chronic disease [4]. Such modifications may be implemented in clinical settings or more broadly as public health interventions, environmental changes to reinforce healthy default choices, or as online or distance-based interventions, but all with the intent to alter health behaviors among individuals.
Assessment of scientific evidence for a given question has evolved in academic publications from the presentation of an individual author's conclusions into a formalized process [5][6][7] that involves conducting a systematic review of all available evidence within predetermined inclusion criteria. A common outcome of a systematic review is an assessment of "strength of evidence" (SOE) by the authors, starting with individual assessments of study quality followed by the use of a SOE grading tool to synthesize and summarize findings from all included studies. SOE is then often used to inform the next step in public health and clinical practice, writing practice recommendations, or assessing strength of recommendations [8,9].
Evaluating SOE for research questions related to health behaviors of individuals is of high importance for public health professionals and clinicians focusing on behavioral modification as part of clinical practice. Interest in lifestyle medicine is rapidly expanding globally [10]. Lifestyle choices can have a major impact on burden of disease and premature death, even if the exact contributions of different components (exercise, diet, smoking, etc.) in the context of total lifestyle pattern are debated. Among the more frequent criticisms of lifestyle medicine is that conclusions and practice recommendations are not adequately informed by randomized controlled trials (RCTs) [11,12]. Counter-arguments, noting the importance of other sources of evidence, have been published as well, at times in tandem [13,14]. Thus, the importance of reliably interpreting relevant evidence about lifestyle choices has never been greater [15].
The majority of current systems for evaluating scientific evidence are well-suited to conventional medical treatment such as pharmacotherapy and discrete procedures. The movement towards evidence-based medicine (EBM) in recent years has emphasized the commonly accepted hierarchy of evidence and generally places results from RCTs above other study designs [16,17]. While this is appropriate in many instances, RCTs are subject to specific biases and may not serve to address questions concerning the lifetime effects of health behaviors [18,19].
Specifically, RCTs have methodological limitations that impede application to the investigation of longevity, overall vitality [20], compression of morbidity [21], and the lifetime [22][23][24] effects of diet, exercise, stress, sleep habits, and other lifestyle components, as well as ethical considerations depending on the research question. Such limitations have been examined in previous decades [18] and, more recently, in new publications highlighting the drawbacks of over-reliance on an RCT-centric model [19]. These limitations are particularly relevant in the context of developing healthcare practice guidelines for treatments that can withstand the challenges of realworld applications [16,25]. Some such limitations of the RCT model include the following: 1. Cost constraints and challenges with adherence makes it difficult to randomize individuals to lifestyle interventions and maintain the prescribed behaviors for sufficient time periods (decades) to investigate the effects of such exposures on mortality or long-term morbidity [26,27]. 2. Blinding of the treatment group is only possible when the treatment is ostensibly similar to the placebo. While this is straightforward in drug trials, it is difficult at best, and often impossible when modifying health behaviors.

The generalizability of results in intervention trials
to the broader population may be limited.
Some debate exists around differences in results seen between observational studies and RCTs. Depending on the research questions, evidence from observational cohort studies may be substantially more informative in drawing conclusions about overall SOE [28]. There may be a particular advantage in hybridizing evidence sources, recognizing that different evidence sources, from bench research, to intervention studies in humans, to observational epidemiology, make distinct contributions to understanding [17,29,30]. Therefore, it would be useful to have a method of evaluating SOE that is tailored to assessing lifestyle interventions and that can offer a more holistic assessment of evidence spanning diverse methods.
We conducted a methodologic systematic review of SOE tools to inform the answer to this question: When RCTs cannot, for whatever reason, serve as the primary evidence source, are there alternative assemblies of evidence that can be used to achieve comparable confidence in a given exposure-outcome relationship?
The research team was convened by the American College of Lifestyle Medicine (ACLM) in joint auspices with the True Health Initiative (THI) to (1) conduct a methodological systematic review of SOE grading tools in recent or current use to characterize which assemblies of evidence produce an evidence rating of highest strength, and (2) analyze the findings and their implications for potentially developing a new grading tool to evaluate SOE in the specific context of lifestyle medicine, where often good RCTs are not available or possible.

Methods
The Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement was followed in reporting this systematic review [31]. The protocol was prospectively developed and registered on the International Prospective Register of Systematic Reviews, PROSPERO, [CRD42018082148] [32,33]. An expert panel (Additional file 1) in evidence-based medicine and its application to nutrition/lifestyle behaviors was convened to assess the findings and make recommendations.

Search strategy
The search strategy was built in PubMed in consultation with a librarian and modified as needed for four additional databases: Embase, AnthropologyPlus, Psy-cINFO, and Ageline. The databases were searched for studies containing keywords related to either lifestyle or longevity. To identify only SOE tools in recent or current use, searches included studies published during the previous five years from the start of the project, from 01/01/2013-11/07/2017. There were seven exposures of interest related to lifestyle: diet, exercise, stress, social relationships/support, addiction(s), sleep, and genetic-based factors with potential for epigenetic modification. Additional search terms were included to restrict the scope of our literature search to papers related to avoidance of chronic disease: longevity, vitality, and healthy or successful aging. Keywords used in the search strategy are presented in Table 1. Search strategies were restricted to systematic reviews and meta-analyses conducted among humans and published in English, as the research team was not able to read or screen non-English papers. Umbrella reviews (systematic reviews of systematic reviews) were not included. To further focus on evaluation of evidence related to the lifetime effects of health behaviors and healthy aging, PubMed and PsycINFO searches were limited to studies in participants 65+ years of age. The complete search strategy for all five databases is presented in Additional file 2.

Inclusion/exclusion criteria
To identify relevant SOE tools in current or recent use, we included systematic reviews and meta-analyses of intervention trials or observational studies that both examined lifestyle medicine exposures and outcomes and evaluated SOE using a specified SOE tool. The inclusion and exclusion criteria applied in abstract and full-text screening are presented in Table 2. Included studies were required to contain only studies conducted in human adults and with at least one comparison group. Studies were excluded if they were conducted in children, healthcare workers, animals, or in vitro or if they only included single-arm trials (i.e., no comparison group). Studies were also excluded if they utilized any pharmaceutical-or supplement-based interventions, utilized genome-wide-association-studies (GWAS), or focused on research methods, validation of instruments or questionnaires, medical devices, or other assays. Additionally, given our focus on lifestyle medicine, studies were excluded if they examined research questions not relevant to lifestyle medicine (e.g., focused on the domains of injury severity, effectiveness of diagnostic tools or medical devices, or mechanistic questions that are tangential to lifestyle interventions or that were not clearly modifiable by lifestyle factors).

Study selection process
After merging results from all five databases and removing duplicates, all citations were title-screened by a single investigator [MK] to exclude in vitro, cell and stem cell studies, animal studies, and studies whose designs were clearly not a systematic review or meta-analysis, such as studies that used other study designs in the title (case report, randomized controlled trial, prospective cohort study, etc.) All studies with ambiguous titles were included at this stage of screening. All abstracts identified via the literature searches were then independently double-screened (independently screened by two different investigators) [MK, MSW, AS] using the inclusion and exclusion criteria ( Table 2) via the open-source, online software Rayyan [34]. Full-text articles were

Data extraction
Data extraction forms were created and received approval from the entire research team prior to use. The information extracted included the following: date first published; purpose of the evaluation; intended audience; number of levels of SOE; the definition of the highest level of SOE; and the placement of cohort studies in the framework of SOE. All data extractors initially extracted 10% of the articles to pilot uniformity of extractions. For all remaining articles, each article was extracted by one investigator and reviewed and confirmed by a second [MK, MSW, AS]. Any disagreements were discussed among the research team and resolved via group consensus.

Risk of bias (ROB) in individual studies
As this systematic review's focus is on SOE grading systems related to lifestyle medicine outcomes and not studies' specific lifestyle-related findings, ROB assessments were not conducted. However, if ROB assessments played a role in the included SOE grading systems, details were extracted.

Data synthesis
Data were summarized in narrative form with regard to the conditions necessary for assigning the highest SOE grading (e.g., for assigning a grade "A" or level "1" rating). Next, the treatment of prospective cohort studies within each SOE rating framework was qualitatively summarized [MK, MSW, AS].

Results
The PRISMA flow diagram for study selection and exclusion is presented in Fig. 1. The manual search guided by the expert panel identified a total of eight unique SOE tools. The systematic search strategy identified a total of 1196 studies. Of these, 267 studies contained one or more relevant keywords. From these, a total of 33 studies mentioned using a specific SOE tool: 23 studies used Grading of Recommendations, Assessment, Development and Evaluation (GRADE) [35], which had previously been identified in the manual search, and 10 studies used a total of seven other unique SOE tools. Thus, a total of 15 unique tools are presented in Table  3.
All 15 tools rated SOE using three to five levels, with the exception of the US Food and Drug Administration (FDA) tool in reference to qualified health claims [36] (two levels). Of the 15 tools included, five were lesserknown methods defined by authors and primarily related • Genome-wide association studies (GWAS, i.e., an analysis comparing the allele frequencies of all available polymorphic markers in unrelated patients with a specific symptom or disease condition, and those of healthy controls to identify markers associated with a specific disease or condition) • Target population is children or healthcare workers 2) Additional criteria used in full-text screening Included strength of evidence (SOE) tools that evaluated one of the following outcomes: • Longevity • Vitality and healthy or successful aging • Disease risk or disease incidence Excluded SOE tools addressed outcomes related to: • Disease prevalence • Injury severity • Efficacy or effectiveness of diagnostic tools, medical devices, or other assays to pain or physical rehabilitation and treatment [37][38][39][40][41][42][43][44][45][46]. The other 10 SOE tools were developed and used by well-known agencies and are applicable in a variety of settings [35,36,[47][48][49][50][51][52][53]. Of these 10, four clearly require consistent results from RCTs of high quality to award the highest rating of evidence: GRADE [35], the FDA tool in reference to health claims for food products [36], the American College of Cardiology / American Heart Association Task Force on Practice Guidelines Levels of Evidence [54], and the Evidence-based Practice Center (EPC) method for grading SOE [51].
Four SOE tools describe more flexibility in the use of study design in determining ratings: the Community Preventive Services Task Force method [47] references study design and its "suitability for answering the research question;" the Grading System from the Academy of Nutrition and Dietetics [50] describes "studies of strong design for the question;", the Johanna Briggs Levels of Evidence identifies different levels of evidence under the separate headings effectiveness, diagnosis, prognosis, economic evaluations, or meaningfulness [52], and the Oxford Centre for Evidence-Based Medicine (OCEBM) Levels of Evidence [53] uses a grid of five levels of evidence, where each level is specifically tailored to seven different kinds of research questions and supports a variety of combinations of quantity and quality of evidence depending on the specific research question.
With the exception of the OCEBM Levels of Evidence [53] specific mention of observational studies was made  3 possible paths to a "Strong" rating a : • ≥2 studies with "good" execution, "greatest" design suitability, and consistent effect sizes of "sufficient" size • ≥5 studies with "good" execution, "greatest or moderate" design suitability, and consistent effect sizes of "sufficient" size • ≥5 studies with "good or fair" execution, "greatest" design suitability, and consistent effect sizes of "sufficient" size It is possible for a prospective cohort study to fulfill the requirements for the "Greatest" rating. Specific study designs are not rigidly placed within the framework; the suitability for answering the research question is assessed in reference to potential threats to validity.    only in reference to their secondary contribution to overall SOE from RCTs, unless RCTs were methodologically flawed.

Conceptualization of SOE approach specific to lifestyle Medicine
Upon completion of the systematic review, the expert panel convened to discuss the findings. The results confirmed that the following methodological elements within existing SOE tools in recent use are lacking: 1. Criteria to evaluate exposure-outcome relationships examined over years/decades/lifetimes 2. Criteria to evaluate behaviors/exposures used in lifestyle medicine that may not allow for randomization or blinding (e.g., smoking, long-term dietary patterns, etc.) 3. Guidance to synthesize findings from diverse study designs, except to prioritize RCTs over observational studies.
To address these issues, the Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM) investigators enumerated the particular contributions of diverse research methods into a complete understanding of exposure/treatment effects, as shown in Table 4.
Based on this simple framework, a new method for selecting the criteria by which SOE can be assessed was developed, titled Evidence Threshold Pathway Mapping (Table 5). It is intended to formalize and make explicit the decision process of which method or tool to use to evaluate SOE. With strength defined operationally as the relevant "threshold" value for some level of confidence, this potential methodologic innovation offers an opportunity to identify the assemblies of evidence that are most appropriate for a given research question, such as change in intermediate risk factors, short-term alleviation of disease symptoms, long-term improvement in diagnosed disease, or long-term prevention. The basic propositions underlying Evidence Threshold Pathway Mapping are that (a) different methods of research are best suited for making different yet complementary contributions to the overall weight of relevant evidence, and (b) different assemblies of evidence can produce the same aggregate strength or confidence. We recognize that in the absence of RCT data for treatment effects, certainty about treatment effects from other types of evidence may be more limited; thus, there is a basis to weight the contributions of RCTs preferentially. However, other types of evidence may still offer a spectrum of certainty or additional context for understanding.  Also implicit in this approach is the contention that various research methods serve different objectives related to evidence about a causal pathway. Bench science and animal model studies are most often used to establish clear and decisive evidence of mechanisms but cannot establish in vivo effects in humans [29]. Controlled intervention studies, and most notably RCTs, are used to establish attribution with confidence, while minimizing bias and controlling for both known and unknown confounders [17]. However, RCTs are not always ethically or practically feasible and they are demanding to implement at the population level, or over time periods relevant to lifetime vitality [30]. They also can introduce sampling bias that may greatly limit generalizability or external validity. Observational epidemiology, notably prospective cohort studies and even ethnographic studies, can readily assess associations at scale and over extended time periods (decades), but these are subject to bias including sampling bias, residual confounding, and they lack the capacity of RCTs to assign attribution with clarity [30].
Accordingly, evidence is strongest when the unique contributions of these diverse methods are synthesized. Making conclusions by drawing from a diversity of evidence sources can potentially allow for confidence in study design methods from one type of research, confidence in attribution from another type, confidence in effects at scale from yet another, and confidence in effects over extended timelines from another still. This amalgamation of complementary evidence is especially important when research questions cannot be readily answered by one study design alone (e.g., What dietary pattern produces the best health outcomes over a lifetime?) [55]. Such considerations are a subject of active discussion in nutrition research [56,57].
Thus, we introduce a new construct-Hierarchies of Evidence Applied to Lifestyle Medicine (HEALM) shown in Table 6, to illustrate means of assessing SOE in future systematic reviews within the domain of lifestyle medicine when the use of GRADE or another SOE tool is not appropriate.
HEALM incorporates the variety of sources of evidence available and synthesizes their contributions into one rating. It is important to note that the method described in Tables 5-6 suggests one specific framework for handling a set of considerations around SOE. Alternative ways of handling such considerations including using a conventionally defined tool such as GRADE, not utilizing a predetermined scoring system, or uniquely adapting an existing tool to the research question being asked. We introduce Evidence Threshold Pathway Mapping and HEALM to illustrate one example of a suitable, customized approach for researchers in lifestyle medicine that can be applied, tested, and validated in practice. The proposed approach for evaluating SOE is informed by the flexibility and specificity presented in OCEBM [53]. HEALM adapts this approach to the specific exigencies of lifestyle medicine, while placing an emphasis on the alignment of research methods with specific questions related to causal pathways. To identify when use of such a tool might be appropriate, we suggest employing Evidence Threshold Pathway Mapping (Table 5) to map the pathway for evidence evaluation along the branches of a simple decision tree. For example, this process produces a suggestion to use the HEALM tool for all research questions concerning lifetime cumulative effects of specific health behaviors, as lifetime effects cannot be assessed in < 5 years. However, it suggests using GRADE [35] for other questions that are feasibly answered with RCTs.

Discussion
Lifestyle behaviors are among the leading determinants of health outcomes, with non-communicable disease   [15], and there is intense debate about the strength and reliability of pertinent evidence [1][2][3]. The majority of current systems for evaluating scientific evidence are well-suited to evaluating pharmaceutical approaches to managing disease, but currently a system for evaluating SOE particular to lifestyle medicine does not exist. Assessment of SOE requires grading the methodological quality and ROB of individual included studies, assessing the consistency and internal validity of studies addressing a specific research question, and forming conclusion statements. Such SOE conclusions can thus inform the discussion on the weight of evidence, informed by multiple studies providing for external validity or generalizability to various populations, settings, and circumstances.
Evidence Threshold Pathway Mapping contends that the same level of confidence, and the same strength of evidence, can be achieved by a variety of assemblies of evidence. The approach respects the unique value of RCTs in establishing attribution and does not assume RCTs are interchangeable with other study designs. Rather, Evidence Threshold Pathway Mapping acknowledges that RCTs may be precluded for various reasons with regard to a given outcome and that other complementary evidence should be considered. Even then, such trials may contribute to understanding by assessing attribution with use of interim measures, and/or surrogate markers. This method of identifying the SOE approach used for evaluation based on the nature of the question being asked is informed by the approach taken in the OCEBM tool [53], which tailors SOE evaluation for different types of research questions. Q1: Are there established mechanisms of action? (a plurality*** of evidence from bench science and animal models) Yes = 2 Uncertain*** = 1 No = 0 Q2: Are there intervention studies in people that provide evidence of causality/attribution? (a plurality*** of high-quality intervention trials, randomized controlled trials, interim measures, and surrogate markers as outcomes) Yes = 3 Uncertain = 1 No = 0 Q3: Are there observational studies to establish generalizability to large, populations? (a plurality*** of high-quality evidence from large prospective, cohort studies) Yes = 2 Uncertain = 1 No = 0 Q4: Are there observational studies to support effects over time periods measured in decades, lifetimes, or generations? (a plurality*** of evidence from high quality, long-term observational studies; retrospective cohort studies; ethnography; transcultural studies) Yes = 2 Uncertain = 1 No = 0 *The HEALM tool is presented here to illustrate potential approaches to scoring evidence across research categories; it does not represent the single, specific approach recommended by the project expert panel on the basis of a formal process consensus process. **Scoring Answers to scoring questions should be based on expert consensus in evaluating available evidence. Evidence is conclusive when it can be identified as sufficient in quantity and quality, and consistent in findings, fostering clear consensus among experts. This would generally mean a replicated finding, and consistent effects among a clear plurality** of high quality, related publications.Evidence is uncertain when studies are few, small, poor quality, or conflicting-but generally suggestive of a particular finding. While expert consensus is critical in evaluation, a framework to inform discussion based on quantitative criteria used in previous umbrella reviews 56 is suggested: 1. Total sample and number of cases of included studies 2. Significance of association based on p-values (highly significant defined as p < 0.0001 vs. nominally significant defined as p < 0.05) and confidence intervals that exclude vs. include the null value 3. When considering studies that include meta-analyses, a target threshold of 1000 cases, no evidence of small-study effects or excess significance bias, a 95% prediction interval excluding the null value and no large, unexplained, between-study heterogeneity (I 2 < 50%) Grade A: Strong evidence = ≥7 (this would require decisive evidence in all other categories, AND at least suggestive evidence from intervention trials in people; OR-strong evidence from intervention trials in people, and decisive evidence in other two categories; OR strong evidence from HEALM contains three scoring** levels of SOE: Grade A (Strong/decisive); Grade B (Moderate/suggestive); Grade C (Insufficient/inconclusive) intervention trials, decisive evidence in any other category, and suggestive evidence in the remaining two. Lends a primacy to RCT evidence but allows for strong evidence even with nothing more than suggestive evidence in intervention trial category.
Grade B: Moderate/suggestive = 5 or 6. Achievable with decisive intervention trial evidence, and strong evidence in ANY other category. OR, strong evidence in all categories other than intervention trials.
Grade C: Insufficient/weak/C = < 5 **Plurality may vary depending on the total number of existing studies conducted on a particular research question and must be determined on a case-by-case basis. For example, three consistent studies from a variety of study design with no opposing studies may constitute a plurality. Were there to be opposing studies the target number would be more than three. A clear numerical plurality of studies but with overall poor quality may constitute a rating of "Uncertain".
HEALM, derived from application of the Evidence Threshold Pathway Mapping approach, is one unique, potential approach organized to frame discussion of existing evidence available to answer specific research questions relevant to lifestyle medicine when existing tools such as GRADE are not viable options (i.e., the question is not fully addressable through RCTs). The scoring, similar to other SOE tools, relies on expert consensus, but is also informed by quantitative scoring considerations used in umbrella reviews [59] to evaluate results from multiple meta-analyses. While grading SOE does not necessarily mean meta-analyses will always be conducted, a quantitative framework to guide discussion will lead to greater consistency of results. HEALM defines categorical levels of SOE, as is conventionally done when evaluating evidence. However, it should be noted that such categories are derived from a continuum of SOE and that the value of the categories is to increase the utility of the tool for communicating findings. The intended purpose of HEALM is to evaluate SOE, which can then be used to develop strength of recommendation-based practice statements. The construct first introduced here may gain traction as is; it may be revised and refined by others; or it may be replaced outright if an alternative metric serving the same goals performs better.
The need for innovation in SOE assessment is in part because the RCT holds a position of relative primacy in the adjudication of medical evidence. Arguments favoring reliance on RCTs rightly invoke the merits in this methodology, namely defense against diverse kinds of bias, and protection against confounders both known and unknown [17] thus prioritizing internal validity. There are, however, diverse and valid concerns with the limitations of RCTs [30] in achieving external validity.
Also of concern are the cases in which observational and intervention trial results appear to be in conflict with one another. In some cases, RCTs may be testing different hypotheses than observational studies, and conclusions from one investigation may not be generalizable to all populations. For example, a review analysis on the use of hormone replacement therapy (HRT) among women in the Women's Health Initiative affirms the consistency of findings across observational and intervention data if the age at time of starting HRT is considered [60].
A recent Cochrane systematic review concluded such differences are likely not due to differences in study design alone; rather, RCTs and observational studies tend to produce similar effect sizes for a range of health outcomes and disagreements are likely due to other study characteristics [61] such as testing different hypotheses [60] or duration of follow-up. While there are examples of RCTs that document outcomes after several years of follow-up post-intervention [62,63], the challenges of adherence [27] severely limit feasibility of continuous interventions over decades. To the authors' knowledge, there are no RCTs that have successfully and continuously implemented an intervention, especially one with a potentially small effect size, for the decades necessary to test "lifetime" effects. Thus, the prevailing impression that results from RCTs are consistently superior may be exaggerated, with the benefits and risks of hormone replacement therapy providing an example of the partial contributions to understanding made possible by both RCTs and observational cohort studies [64][65][66][67].
In contrast, there are clear cases in which observational studies offer a superior method of evaluating questions concerning the cumulative, lifetime effects of lifestyle practices. A key example of such trials whose recruitment is designed to maximize the number of endpoints is the Alpha-Tocopherol, Beta-carotene Lung Cancer Prevention (ATBC) Study which targeted male smokers [28]. In capturing hard endpoints such as cancer and cancer-related mortality, short-term RCTs would be of insufficient duration to see the outcome of interest, as well as being impossible to implement with exposures like smoking for ethical reasons.
The HEALM tool scores evidence, lending particular weight to RCTs for the clarification of causal effects and attribution. The tool, however, allows for rating evidence as strong even if RCT data are not more than suggestive, provided evidence from all other complementary research approaches are decisive and aligned. More importantly, short-term evidence from RCTs, or focus on isolated biomarkers, absent any suitably long-term data addressing hard outcomes would not score as "strong" in the realm of lifestyle medicine because of the great potential divide between short and long-term effects. As an example, many serious infectious diseases lower weight and blood lipids; such "favorable" trends in biomarkers are obviously not indicative of beneficial health effects in the long term. This adaptation of established approaches readily accommodates the imperative of judging the impact of lifestyle practices on health outcomes over the full human life span.
The strength of this study was to take an approach of a methodological systematic review to capture existing and recently used SOE tools, thus ensuring that a new method proposed would offer a novel contribution to address current methodological gaps. Limitations of this study included the focus in the search strategy on healthy aging as an outcome, rather than risk for specific chronic diseases. The search strategy was constructed in alignment with the target outcomes of lifestyle medicine practice (healthy aging, as opposed to chronic disease), and inclusion of all major chronic disease outcomes would not have been practical due to the large number of search results. Additionally, the search strategy was limited to systematic reviews of studies conducted among those ≥65 years, not because lifestyle medicine is only relevant for older populations, but because this focused the search strategy to identify studies in the domain of longevity. SOE tools used in these contexts would be potential best matches for evaluating evidence concerning other lifestyle medicine-type questions. However, manual searching for SOE tools based on expert panel recommendations augmented the systematic review results to the degree that all major tools known by the expert panel are included in our results.
Finally, the HEALM construct is dependent on conclusions about the "plurality" of evidence from distinct research methods. Other than results produced from systematic review of meta-analysis, there is no universal standard for sufficient or sufficiently consistent evidence to establish the veracity of a given causal pathway or weight of the evidence for a given research topic. Even meta-analyses and systematic reviews fail to reach this standard, because in "crowded" research domains more than one such study is common and they may conflict with one another. The Community Preventive Services Task Force (CPSTF) [47] provides some guidance on assigning strength of recommendations based on SOE conclusions by suggesting that inconsistent evidence should lead to separate recommendations for specific populations, and that no conclusions should be reached in the case of conflicting evidence. However, this guidance does not provide a framework for synthesizing strength and weight of evidence more broadly. Further, a limitation of HEALM is that it utilizes categories to assign relative levels of confidence, though this limitation is common to existing SOE tools.
The problem of establishing an operational definition for the "weight of evidence," or a decisive plurality of studies, is in no way specific to lifestyle medicine. This is a generic challenge pertaining to all assessments of overall evidence, and thus deemed beyond the scope of this particular effort. This group simply notes the importance of this issue, and its pertinence to both Evidence Threshold Pathway Mapping and HEALM. This paper invites attention to the matter and highlights the opportunity to fortify operational definitions in this area.
This project was commissioned with a preferential focus on lifestyle medicine, but the implications apply broadly to public health. Lifestyle practices and exposures-dietary patterns, physical activity patterns, sleep patterns, tobacco and alcohol exposures, psychological stressors, social connections-while uniquely emphasized in lifestyle medicine (4), pertain to all fields of medicine and public health and to all health professionals.
Future research should test application of Evidence Threshold Pathway Mapping and HEALM by conducting systematic reviews on specific research questions in the domain of lifestyle medicine. The HEALM construct should evolve, informed by research in which it is applied.

Disclaimer
The views expressed are those of the authors and do not reflect an official position of the National Cancer Institute or National Institutes of Health.
Authors' contributions DK and WW conceived of the research question. MK, MSW, and DK developed the study design. MK, MSW, and AS performed the systematic review, with supervision from MC. DK, MC, LG, JL, and WW interpreted the data and formed conclusions. All authors contributed to the writing and editing of the manuscript.

Funding
Funding was provided by the American College of Lifestyle Medicine (MCK, MMSW). Expert panel members (LWG, JF, MC, and WW) received honorarium from the American College of Lifestyle Medicine for their contributions to this work. Additional funding was provided by the Centers for Disease Control, grant 5U48DP005023-04 (DK). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.