Table 3 Methodological quality of included studies based on COSMIN risk of bias (RoB) checklist

Tool or dimension Report Content validity Internal structure Remaining measurement properties
Structural validity Internal consistency Cross-cultural validity Reliability Measurement error Criterion validity Construct validity
“Applicability”-dimension of LEGEND Clark et al. [77] doubtful           
“Applicability”-dimension of Carr´s evidence-grading scheme Carr et al. [63] inadequate           
Bornhöft´s checklist Bornhöft et al. [78] inadequate           
Cleggs´s external validity assessment Clegg et al. [64] inadequate           
Clinical Applicability Haraldsson et al. [66] inadequate           
Clinical Relevance Instrument Cho & Bero [79] doubtful doubtful doubtful doubtful        
Cho & Bero [80]         adequate    
Clinical Relevance according to the CCBRG Van Tulder et al. [81] inadequate doubtful doubtful doubtful        
Clinical relevance scores (Karjalainen´s) Karjalainen et al. [68] inadequate           
Estrada´s applicability assessment criteria Estrada et al. [82] doubtful           
EVAT Khorsan & Crawford [83] doubtful           
“External validity”-dimension of the Downs & Black Checklist Downs & Black [22] doubtful doubtful doubtful doubtful   doubtful   very gooda inadequatea   adequate
O´Connor et al. [84]         very good    
“External validity”-dimension of Foy´s quality checklist Foy et al. [65] inadequate           
“External validity”-dimension of Liberati´s quality assessment criteria Liberati et al. [69] inadequate           
“External validity”-dimension of Sorg´s checklist Sorg et al. [71] inadequate           
“External validity”-criteria of the USPSTF USPSTF manual [73] inadequate           
O´Connor et al. [84]         very good    
FAME scale Averis et al. [70] inadequate           
GAP checklist Fernandez-Hermida et al. [76] inadequate           
Gartlehner´s tool Gartlehner et al. [86] inadequate        very good adequate adequate  
Zettler et al. [87]         very good    
Green & Glasgow´s external validity quality rating criteria Green & Glasgow [88] inadequate           
Laws et al. [91]            doubtful
Mirza et al. [90]         adequate    doubtful
“Indirecntess”-dimension from the GRADE Handbook [92] Atkins et al. [48] adequate           
Wu et al. [93]         inadequate    
Loyka´s external validity framework Loyka et al.75 doubtful         adequate   
modified “Indirectness” of the Checklist for GRADE Meader et al. [94] adequate        adequateb    
Llewellyn et al. [95]           
External validity checklist of the NHMRC Handbook NHMRMC Handbook [74] inadequate           
revised GATE in the NICE manual NICE Guideline [72] inadequate           
RITES tool Wieland et al. [47] adequate adequate very good very good        
Aves et al. [97, 101]         inadequate    very good
“Selection Bias”-dimension (Section A) of the EPHPP tool Thomas et al. [98] inadequate doubtful doubtful doubtful     doubtful    doubtful
Armijo-Olivo et al. [99]         doubtful    
Section D of the CASP checklist for RCTs Critical Appraisal Skills Programme [100] inadequate           
Whole Systems research considerations´checklist Hawk et al. [67] inadequate           
  1. Fields left blank indicate that those measurement properties were not assessed by the study authors
  2. Abbreviations: CB comprehensibility, RE relevance, CV comprehensiveness, CCBRG Cochrane Collaboration Back Review Group, EPHPP Effective Public Health Practice Project, EVAT External Validity Assessment Tool, FAME Feasibility, Appropriateness, Meaningfulness and Effectiveness, GAP Generalizability, Applicability and Predictability; GATE Graphical Appraisal Tool for Epidemiological Studies, GRADE Grading of Recommendations Assessment, Development and Evaluation; LEGEND Let Evidence Guide Every New Decision, NHMRC National Health & Medical Research Council, NICE National Institute for Health and Care Excellence, RITES Rating of Included Trials on the Efficacy-Effectiveness Spectrum, USPSTF U.S. Preventive Services Task Force
  3. a two studies on reliability (test-retest & inter-rater reliability) in the same article
  4. b results from the same study on reliability reported in two articles [94, 95]