How are systematic reviews of prevalence conducted? A methodological study.

BACKGROUND
There is a notable lack of methodological and reporting guidance for systematic reviews of prevalence data. This information void has the potential to result in reviews that are inconsistent and inadequate to inform healthcare policy and decision making. The aim of this meta-epidemiological study is to describe the methodology of recently published prevalence systematic reviews.


METHODS
We searched MEDLINE (via PubMed) from February 2017 to February 2018 for systematic reviews of prevalence studies. We included systematic reviews assessing the prevalence of any clinical condition using patients as the unit of measurement and we summarized data related to reporting and methodology of the reviews.


RESULTS
A total of 235 systematic reviews of prevalence were analyzed. The median number of authors was 5 (interquartile range [IQR] 4-7), the median number of databases searched was 4 (3-6) and the median number of studies included in each review was 24 (IQR 15-41.5). Search strategies were presented for 68% of reviews. Forty five percent of reviews received external funding, and 24% did not provide funding information. Twenty three percent of included reviews had published or registered the systematic review protocol. Reporting guidelines were used in 72% of reviews. The quality of included studies was assessed in 80% of reviews. Nine reviews assessed the overall quality of evidence (4 using GRADE). Meta-analysis was conducted in 65% of reviews; 1% used Bayesian methods. Random effect meta-analysis was used in 94% of reviews; among them, 75% did not report the variance estimator used. Among the reviews with meta-analysis, 70% did not report how data was transformed; 59% percent conducted subgroup analysis, 38% conducted meta-regression and 2% estimated prediction interval; I2 was estimated in 95% of analysis. Publication bias was examined in 48%. The most common software used was STATA (55%).


CONCLUSIONS
Our results indicate that there are significant inconsistencies regarding how these reviews are conducted. Many of these differences arose in the assessment of methodological quality and the formal synthesis of comparable data. This variability indicates the need for clearer reporting standards and consensus on methodological guidance for systematic reviews of prevalence data.


Background
The proportion of a population currently suffering from a disease or particular condition of interest (prevalence) is an important [1] metric that allows researchers to assess disease burden, that is, who among the population is experiencing a certain disease, at a very specific point in time, typically measured using a cross-sectional study design [2]. The subsequent synthesis of this information in the form of a rigorously conducted and transparently reported systematic review has significant potential to better inform social and healthcare professionals, policy makers and consumers to better manage and plan for this disease burden [3].
The number of systematic reviews of prevalence data has increased steadily over the last decade [4]. As can be seen in Fig. 1, a search of PubMed, conducted in October 2019 using the terms "systematic review" and "prevalence" in the title, identified a more than ten-fold increase in the number of reviews published from 2007 to 2017.
Despite this uptake and increased interest in the scientific community in the conduct of a systematic review of prevalence data, there remains little discourse on how these reviews should be conducted and reported. Although methodological guidance from the Joanna Briggs Institute (JBI) exists for these review types, [5] there currently appears to be little other discussion on how these reviews should be prepared, performed (including when and how results can be pooled using meta-analytical methods) and reported. This is an important consideration, as standards of reporting and conduct for traditional systematic reviews (i.e. systematic reviews of interventions) [6] are considered commonplace, and even endorsed by journal editors. Even though such standards and guidelines could be adapted for systematic reviews of prevalence, they do not comprehend specific issues for this type of study, and have not been as readily adopted. This is despite the fact that there is now general acceptance that there needs to be different types of approaches for systematic reviews looking at different types of evidence [6][7][8].
This lack of accepted, readily adopted and easy to implement methodological and reporting guidance for systematic reviews of prevalence data has the potential to result in inconsistent, varied, inadequate and potentially biased research conduct. Due to the particular importance of this kind of data in enabling health researchers and policy makers to quantify disease amongst populations, such research conducted without appropriate guidance is likely to have far-reaching and complicated consequences for the wider public community.
As such, the objective of this research project was to conduct a meta-epidemiological review of a sample of systematic reviews that have been published in peer-reviewed journals and evaluated a question regarding the prevalence of a certain disease, symptom or condition. This project allowed us to investigate how these reviews are conducted and to provide an overview of all methods utilized by systematic reviews authors asking a question of prevalence. The results of this project can potentially inform the development of future guidance for this review type.

Search strategy
To retrieve potentially relevant reviews, we searched MEDLINE (via PubMed) using the terms 'systematic

Study selection and eligibility criteria
The selection of studies was conducted in two phases by two independent reviewers. First, we screened titles and abstracts. Then, we retrieved the full-text of potentially relevant studies to identify studies meeting our inclusion criteria.
We included systematic reviews of prevalence of any clinical condition, including diseases or symptoms. We excluded primary studies, letters, narrative reviews, systematic reviews of interventions or diagnostic accuracy, systematic reviews assessing the association between variables and systematic reviews of prevalence that did not use patients as the unit of measurement, as well as studies not published in English.

Data abstraction
Using a standard and piloted form, one author extracted relevant data from each review and another author independently checked all data. Discrepancies were discussed and solved by consensus or by a third reviewer.
We abstracted the following data from individual studies: general information about the paper (number of authors, journal and year of publication), reporting of funding, reporting of search strategy, number and description of databases consulted, number of reviewers involved in each step of the review (study screening, selection, inclusion and data extraction), number of studies included in the review, methods for risk of bias appraisal and quality of evidence assessment, synthesis of results, assessment of publication bias and details of meta-analytic processes if meta-analysis was conducted (including variance estimator, transformation of data, heterogeneity assessment, and software used).

Data analysis
Results are presented using descriptive statistics. Quantitative variables are presented as means and standard deviations or median and interquartile range, as appropriate; and qualitative variables are presented in absolute and relative frequencies.

Results
Our search resulted in 325 articles. We assessed the full text of 251 and included 235 of them in our analysis. Figure 2 presents the flowchart of study selection. The main characteristics of the included studies are described in Table 1. The complete list of included studies is presented in Additional file 1 and the list of full text excluded with reasons is presented in Additional file 2. The complete data extraction table is presented in Additional file 3.
Of 235 studies included, 83 (35.3%) presented narrative synthesis, without quantitative synthesis, while 152 (64.7%) performed meta-analysis. These 235 articles were published in 85 different journals. The leading publication journals (with 5 or more published reviews) were PLoS One, BMJ Open, Lancet Global Health and Medicine. The complete list of journals where included reviews were published is presented in Additional file 4.
One hundred and eighty-eight reviews (80.0%) assessed the methodological quality of included studies. One hundred and five (44.7%) reported that this assessment was performed by at least two reviewers. There was a great variability regarding the instruments used to critically appraise included studies. Twenty-four studies (10.2%) developed new tools, and 24 (10.2%) used nonvalidated tools from previous similar systematic reviews. Among validated and specific tools to assess prevalence studies, the JBI prevalence critical appraisal tool was the most used (n = 21, 8.9%). Fifteen reviews (6.4%) used STROBE, a reporting guideline, to assess the methodological quality of included studies. Thirty-two reviews (17.0%) used assessment of methodological quality as an inclusion criterion for the review.
Nine studies (3.8%) assessed the quality of the body of evidence, with GRADE being the highest cited methodology (n = 4, 1.7%). Not all methods used to assess the quality of evidence were validated and appropriate. For instance, one study summarized the quality of evidence as the mean STROBE score of included studies.

Statistical methods for meta-analysis
One hundred and fifty-two studies (64.7%) conducted meta-analysis to summarize prevalence estimates. The methods used by these reviews are summarized in Table 2.
The vast majority of studies (n = 151, 99.3%) used classic methods instead of Bayesian approaches to pool prevalence estimates. The majority of studies pooled estimates using a random effects model (n = 141, 93.4%), and 7 studies (4.6%) utilized the fixed effect model. Two studies (1.3%) used the 'quality model', where the weight of each study was calculated based on a quality assessment. However, both reviews used non-validated methods to critically appraise included studies.

Discussion
This meta-epidemiological study identified 235 systematic reviews addressing a question related to the prevalence of any clinical condition. Our investigations have found that across the included systematic reviews of prevalence, there are significant and important discrepancies in how these reviews consider searching, risk of bias and data synthesis. In line with our results, a recently published study assessed a random sample of 215 systematic reviews of prevalence and cumulative incidence, and also found great heterogeneity among the methods used to conduct these reviews [7]. In our view, the study conducted by Hoffmann et al. and our study are complimentary. For instance, in the first one, the authors included reviews published in any year and compared the characteristics of reviews published before or after 2015 and with or without metanalysis; in our study, we assessed other methodological characteristics of the included reviews, specially related to the conduction of meta-analysis.
One area for potential guidance to inform future systematic reviews of prevalence data is in the risk of bias Adds to more than 100% because some reviews were counted in more than one option Adds to more than 100% because some reviews were counted in more than one option or critical appraisal stage. As can be seen from the results of our review, there are a number of checklists and tools that have been used. Some of the tools utilized were not appropriate for this assessment, such as the Newcastle Ottawa Scale (designed for cohort and casecontrol studies) [8] or STROBE (a reporting standard) [9]. Interestingly, some of the tools identified were designed specifically for studies reporting prevalence information, whilst other tools were adapted for this purpose, with the most adaptions to any one tool (n = 13, 5.5%) being made to the Newcastle Ottawa Scale. This is particularly concerning when reviews have used results of quality assessment to determine inclusion in the review, as was the case in 17% of the included studies. The combination of (a) using inadequate or inappropriate critical appraisal tools and (b) using the results of these tools to decide upon inclusion in a systematic review could lead to the inappropriate exclusion of relevant studies, which can alter the final results and produce misleading estimates. As such, there is an urgent need for the development and validation of a tool for assessing prevalence, along with endorsement and acceptance by the community to assist with standardization in this field. In the meantime, we urge reviewers to refer to the JBI critical appraisal tool, [4] which has been formally evaluated and is increasingly used across these types of reviews. Encouragingly, 72.3% of the included reviews adhered to or cited a reporting guideline in their review. The main reporting guideline reported was the PRISMA statement [6]. However, this reporting guideline was designed particularly for reviews of interventions of therapies. As such, there have been multiple extensions to the original PRISMA statement for various review types [10,11], yet no extension has yet been considered for systematic reviews of prevalence data. To ensure there is a reporting standard for use in prevalence systematic reviews, an extension or a broader version of the PRISMA statement including items important for this review type is recommended.
It was encouraging to see that multiple databases were often searched during the systematic review process, which is a recommendation for all review types. However, it is also important in systematic reviews that all the evidence is identified, and in the case of prevalence information, it may be particularly useful to search for data in unpublished sources, such as clinical registries, government reports, census data, and national administrative datasets, for example [12,13]. However, there are no clearly established procedures on how to deal with this kind of information. Further guidance on searching for evidence in prevalence reviews is required.
In our review, we found 64.7% of reviewers conducted meta-analysis. There has been debate regarding the appropriateness of meta-analysis within systematic reviews of prevalence, [14,15] [16][17][18] largely surrounding whether synthesizing across different populations is appropriate, as we reasonably expect prevalence rates to vary across different contexts and where different diagnostic criteria may be employed. However, metaanalysis, when done appropriately and using correct methodology, can provide important information regarding the burden of disease, including identifying differences amongst populations and regions, changes over time, and can provide a summarized estimate that can be used when calculating baseline risk, such as in GRADE summary of findings tables.
The vast majority of meta-analyses used classic methods and a random effects model, which is appropriate in these types of reviews [17] [18]. Although the common use of random effects across the reviews is encouraging, this is where the consistency ends, as we once again see considerable variability in the choice of methods to transform prevalence estimates for proportional meta-analysis. The most widely used approach was the Freeman-Tukey double arscine transformation. This has been recommended as the preferred methods for transformation [15,18], although more recently it has come under question [19]. As such, further guidance and investigation into meta-analytical techniques is urgently required.
In reviews including meta-analysis, heterogeneity was assessed with the I 2 in 94.7% of the included studies. Although I 2 provides a useful indication of statistical heterogeneity amongst studies included in a meta-analysis, it can be misleading in cases where studies are providing large datasets with precise confidence intervals (such as in prevalence reviews) [20]. Other assessments of heterogeneity, such as T 2 and prediction intervals, may be more appropriate in these types of reviews [20,21]. Prediction intervals include the expected range of true effects in similar studies [22]. This is a more conservative way to incorporate uncertainty in the analysis when true heterogeneity is expected; however, it is still underused in meta-analysis of prevalence. Further guidance on assessing heterogeneity in these types of reviews is required.
Of the 235 systematic reviews analyzed, only 9 (3.8%) included a formal quality assessment or process to establish certainty of the entire body of evidence. Separate to critical appraisal, quality assessment of the entire body of evidence considers other factors in addition to methodological assessment that may impact on the subsequent recommendations drawn from such evidence [23]. Of these 9 reviews, only 4 (1.7%) followed the GRADE approach [24] which is now the commonplace methodology to reliably and sensibly rate the quality of the body of evidence. The considerably small number of identified reviews that included formal quality assessment of the entire body of evidence might be linked to the lack of formal guidance from the GRADE working group. The GRADE approach was developed to assess issues related to interventions (using evidence either from interventional or observational studies) and for diagnostic tests, however there are several extensions for the application of GRADE methodology. While there is no formal guidance for GRADE in systematic reviews of prevalence, there is some guidance into the use of GRADE for baseline risk or overall prognosis [25] which may be useful for these types of reviews. While no formal guidance exists, using the guidance cited above can serve as an interim method and is recommended by the authors for all future systematic reviews of prevalence. Whilst the number of identified studies that utilized GRADE in particular is small, it is encouraging to see the adoption of these methods in systematic reviews of prevalence data, as these examples will help contribute to the design of formal guidance for this data type in future.
It is reasonable to expect some differences in how these types of reviews are conducted, as different authors groups will rationally disagree regarding key issues, such as whether Bayesian approaches are best or which is the ideal method for transforming data. However, the wide inconsistency and variability noted in our review are far beyond the range of what could be considered reasonable and is of considerable concern for a number of reasons: this lack of standardization may encourage unnecessary duplication of reviews as reviewers approach similar questions with their own preferred methods; (2) novice reviewers searching for exemplar reviews may follow inadequate methods as they conduct these reviews; (3) the general confusion in end users, peer reviewers and readers of these reviews as they are required to become accustomed to various ways of conducting these studies; (4) review authors themselves following inadequate approaches, missing key steps or information sources, and importantly increasing the potential for review authors to report inaccurate or misleading summarized estimates; (5) a lack of standardization across reviews limits the ability to streamline, automate or use artificial intelligence to assist systematic review production, which is a burgeoning field of inquiry and research; and (6) these poorly conducted and reported reviews may have limited or even detrimental impacts on the planning and provision of healthcare. As such, we urgently call for the following to occur to rectify these issues, (1); further methods development in this field and for updated guidance on the conduct of these types of reviews, (2); we urgently require a reporting standard for these reviews (such as an extension to PRISMA), (4); the development (or endorsement) of a tailored risk of bias tool for studies reporting prevalence estimates, (5); the further development and promotion of software [26] and training materials [27] for these review types to support authors conducting these reviews.

Limitations of our study
In this study, we have collated and interrogated the largest dataset of systematic reviews of prevalence currently available. Although only a sample, it is likely to be representative of all published prevalence systematic reviews, although it is important that we acknowledge we only searched MED-LINE over a period of 1 year using a search strategy that only retrieved studies with the terms "prevalence" and "systematic review" in the title. There is potential that systematic reviews of prevalence published in journals not indexed (or not published at all) are meaningfully different from those characterized in our results. However, given that reviews indexed on MEDLINE are (hypothetically) likely to be of higher quality than those not indexed, and given that we still identified substantial inconsistency, variability and potentially inappropriate practices in this sample, we doubt that a broader search will have altered our main conclusions significantly. Similarly, we believe that reviews that do not use the terms "prevalence" and "systematic review" in the title terms would have, overall, even more inappropriate methods. Regarding the timeframe limitation, we decided to include only reviews recently published because older reviews may not reflect the current practice.

Conclusions
This meta-epidemiological review found that among this sample of published systematic reviews of prevalence, there are considerable discrepancies in terms of conduct, reporting, risk of bias assessment and data synthesis. This variability is understandable given the limited amount of guidance in this field, the lack of a reporting standard and a widely accepted risk of bias or critical appraisal tool. Our findings are a call to action to the evidence synthesis community to develop guidance and reporting standards urgently for these types of systematic reviews.