The tool development process of CAT HPPR was pre-determined by a research protocol in German language (see Availability of data and materials). The reporting on the development process of CAT HPPR is informed by recommendations defined by Whiting et al. for developing quality assessment tools (“Stage 2: tool development”) [42]. Documentation of the search and selection process to identify and select CATs, reporting guidelines and items is appended to this article.
Search for retrieving reporting guidelines and standards to define review types
For developing the CAT HPPR, we first used a pragmatic search in July 2019 of relevant websites (including the EQUATOR Network, Cochrane, JBI), an electronic database (Medline) and references provided by members of the larger project team in order to identify relevant reporting guidelines/standards (n = 3) [1, 2, 9]. We then collated further guidance documents for conducting a review within the scope of the tool (systematic reviews, rapid reviews, scoping reviews) and optional complementary review approaches (as review of reviews (also known as overview of reviews), with mixed-methods-approach, with meta-analysis) (n = 17) [5,6,7, 41, 43,44,45,46,47,48,49,50,51,52,53,54,55].
Consensus approach to define review types
Based on identified documents, narrow working definitions for review types were developed and further refined, involving all project partners (tool developers, members of a project-specific reviewer pool piloting the novel CAT). Tool developers comprised all authors of this article (n = 8), whereas members of the reviewer pool (n = 3) were experienced review authors with methodological knowledge beyond systematic and Cochrane reviews recruited and commissioned by the “GKV-Bündnis für Gesundheit” (see Acknowledgements). Given the lack of consensus regarding the different types of reviews and their complementary approaches in the scientific literature, this step was crucial for achieving better applicability of the to-be-developed appraisal tool, its criteria and the global rating algorithm. Methodologically less narrowly defined types of reviews (e.g. overviews) or those that had a very large overlap with the types and approaches we had defined already (e.g. mapping reviews, umbrella reviews) were not considered separately [41].
Search for retrieving CATs to inform items
As a further step towards identifying and tailoring relevant content of pre-existing CAT and their criteria for our tool, we carried out an electronic search using the same approach (i.e. Medline, websites, literature provided by project partners) as we did for reporting standards. Inclusion criteria for CATs were defined as follows: CAT originally developed for review articles, question/item-based CAT, CAT applicable to general medical or health topics, and CAT with corresponding guidance documents readily available.
Compiling initial list of items for inclusion
We assessed 30 full-texts of CATs and other review evaluation instruments for eligibility. Excluded CATs were not exclusively developed to assess reviews (n = 1 [31];), mainly developed for training of practitioners (n = 3 [18, 19, 23];), developed for a certain medical field (n = 1 [25];), had no or limited guidance available (n = 3 [11, 21, 32];), were developed to assess the relevance of review findings (n = 2 [16, 26];), or were not considered for data extraction as the main report suggested strong overlap with another established CAT (n = 2 [22, 27];). As a result, 14 CATs [13,14,15, 20, 24, 28,29,30, 33,34,35,36,37,38] based on 18 reports were finally considered eligible for item identification. Included CATs were mainly developed for the quality assessment of systematic reviews. Since the CAT of healthevidence.org shared the most similar aim and content with our to-be-developed CAT [14], individual criteria of this tool were first extracted and compared to extracted criteria from the remaining 13 CATs. Extracted data was checked by a second tool developer. Criteria with the same wording or content across different CATs were removed.
Initial items and scope
A review process of all criteria, including discussion among and consensus decisions by tool developers, led to a reduction in the number of identified individual criteria from 46 to 15. The following exclusion criteria informed the process of exclusion: strong overlap with items of healthevidence.org (n = 11; i.e. similar wording), limited relevance for quality of review findings (n = 16; e.g. “Were directions for future research proposed?” [36]), and limited potential for replicable assessments (n = 4; e.g. “Date of review – is it likely to be out of date?” [35]). The overall aim was to identify items which were comprehensive, relevant and objectively appraisable. Given some overlap between individual criteria in the set of extracted criteria, a factor analysis, as performed by developers of the original AMSTAR tool [34], was not undertaken. Instead, we extended some criteria with objectively appraisable content during further internal revisions of the tool (see Manual; coding boxes).
First draft of CAT HPPR and guidance development
We also used reporting guidelines/standards as well as guidance documents for reviews for setting basic requirements for each criterion to be fulfilled by a review and developed further guidance for reaching a judgement by a user. A global rating system to combine information gained from all 15 criteria was introduced.
Piloting and refinement
Finally, after piloting a first version of the CAT HPPR with 14 reviews, feedback and requests for further clarification by intended users of the tool’s assessment and experts of the project-specific reviewer pool led to final adjustments of the tool [40]. Feedback and requests were based on completed assessments among all major review types CAT HPPR was originally designed for (SR: n = 2, RR: n = 2, ScR: n = 10). As a result, a review-type specific algorithm was introduced in the global rating system in order to better take methodological advantages and disadvantages of individual review types into account. Among other things, the “Risk of Bias Assessment” was thus highlighted as a basic requirement and quality feature in systematic reviews compared to other review types. Informal feedback of CAT HPPR users was requested at the end of the piloting stage regarding processing time (not actually timed) and overall satisfaction with scope and applicability of CAT HPPR and its guidance documents.