A meta-review demonstrates improved reporting quality of qualitative reviews following the publication of COREQ- and ENTREQ-checklists, regardless of modest uptake

Background Reviews of qualitative studies allow for deeper understanding of concepts and findings beyond the single qualitative studies. Concerns on study reporting quality led to the publication of the COREQ-guidelines for qualitative studies in 2007, followed by the ENTREQ-guidelines for qualitative reviews in 2012. The aim of this meta-review is to: 1) investigate the uptake of the COREQ- and ENTREQ- checklists in qualitative reviews; and 2) compare the quality of reporting of the primary qualitative studies included within these reviews prior- and post COREQ-publication. Methods Reviews were searched on 02-Sept-2020 and categorized as (1) COREQ- or (2) ENTREQ-using, (3) using both, or (4) non-COREQ/ENTREQ. Proportions of usage were calculated over time. COREQ-scores of the primary studies included in these reviews were compared prior- and post COREQ-publication using T-test with Bonferroni correction. Results 1.695 qualitative reviews were included (222 COREQ, 369 ENTREQ, 62 both COREQ/ENTREQ and 1.042 non-COREQ/ENTREQ), spanning 12 years (2007–2019) demonstrating an exponential publication rate. The uptake of the ENTREQ in reviews is higher than the COREQ (respectively 28% and 17%), and increases over time. COREQ-scores could be extracted from 139 reviews (including 2.775 appraisals). Reporting quality improved following the COREQ-publication with 13 of the 32 signalling questions showing improvement; the average total score increased from 15.15 to 17.74 (p-value < 0.001). Conclusion The number of qualitative reviews increased exponentially, but the uptake of the COREQ and ENTREQ was modest overall. Primary qualitative studies show a positive trend in reporting quality, which may have been facilitated by the publication of the COREQ. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01363-1.


Introduction
Qualitative studies allow for a deeper understanding of people's experiences, beliefs, attitudes or behaviours. These studies usually focus on why participants think or act in a certain way, using open ended data gathering methods such as interviews, focus groups or observations [1,2]. They can be regarded as hypothesis generating research, and while research methods fundamentally differ when compared to quantitative research, they are not necessarily incompatible nor mutually exclusive. Both methods can complement each other, for example hypotheses that originated from qualitative research may be statistically tested in quantitative research, or findings from quantitative research can be explained by qualitative research [3,4]. As in all fields of research, poorly designed, conducted or reported qualitative studies can lead to inappropriate findings [5].
In 2007, the COREQ (Consolidated criteria for reporting qualitative research) checklist was developed to assess the reporting quality of qualitative studies [6]. Realizing that, in contrast to most other research fields, no widely used comprehensive checklist, nor uniform and accepted requirements for publication of qualitative research existed, the authors aimed to "… promote complete and transparent reporting among researchers and indirectly improve the rigor, comprehensiveness and credibility of interview and focus-group studies." [6] Items from 22 published checklists were compiled into a single 32-item checklist and grouped into three domains (research team and reflexivity, study design and data analysis and reporting), thus creating a comprehensive checklist covering the main aspects of qualitative research.
Though aimed at researchers conducting an interviewor focus group study, the COREQ also became frequently used in reviews on qualitative studies to assess the reporting quality of the included studies in the absence of a checklist specifically developed for this purpose. Qualitative reviews, a novel study design, aims to systematically synthesize the included qualitative studies instead of generating original data to achieve abstraction and transferability at a higher level beyond the included original studies [7,8]. While in 2007, when the COREQ was published, the number of qualitative reviews was relatively limited, in 2012 this number had increased substantially. Thus, using a similar approach as the COREQ, in 2012 members from the same research team and international experts developed the ENTREQ (Enhancing transparency in reporting the synthesis of qualitative research) checklist, for reviews as opposed to original studies [9]. This 21-item checklist covers five domains (introduction, methods and methodology, literature search and selection, appraisal, and synthesis of findings) and aims to "… develop a framework for reporting the synthesis of qualitative health research." [9] Since the publication of both checklists, a large number of reviews of qualitative studies have been published on a wide array of topics. Though it has been argued that reporting checklists for qualitative research would not necessarily result in better research [10], and neither checklists were developed following the now accepted methods for developing reporting standards [11], both the COREQ and the ENTREQ are now included in the EQUATOR network [12], and are required by many clinical journals for submission; the high number of citations (respectively over 5.600 and 700 in Web of Science) indeed indicate usage. To this date however, no studies have been conducted to explore the uptake of the COREQ and the ENTREQ in reviews, or the effect on the reporting quality, which for guidelines in other research methods has been the case [13][14][15][16][17][18]. Therefore, the aim of this meta-review is twofold: 1) to investigate the uptake of the COREQ and ENTREQ checklists in reviews of primary qualitative studies, and 2) to compare the quality of reporting of the original qualitative studies included in these reviews prior-and post-publication of the COREQ.

Methods
This meta-review was reported in line with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [19].

Search strategy
Using similar searching methods as in previous studies, we developed three searches: the first search aimed to identify all qualitative reviews that cited the COREQ, the second aimed to identify all reviews that cited the ENTREQ; for these two searches, we used Web of Science and PubMed. Next, using terms encountered in these reviews, and building upon previous studies [20][21][22][23], we developed a comprehensive search method in PubMed to identify those reviews that did not specifically mention the COREQ or the ENTREQ. We then refined this broad search in an iterative process described in detail in the supplement, section A, and recoded the query to four other electronic databases: Cochrane library, Embase, Emcare and Web of Science. Searches were designed in collaboration with an experienced medical librarian and conducted on the 2nd of September 2020, including all articles since database inception (which differed per database). We then subtracted the results of the two other searches from this dataset. In the end, we thus obtained three databases: 1) studies citing the COREQ, 2) studies citing the ENTREQ, and 3) studies citing neither COREQ nor ENTREQ.

Eligibility methods
Studies were eligible for inclusion if they were 1) a review and 2) contained qualitative or mixed-methods research approaches. We created four datasets: reviews using the 1) COREQ, 2) ENTREQ, 3) both the COREQ and ENTREQ and 4) neither the COREQ or ENTREQ. To be included in the respective datasets, reviews using the COREQ were required to appraise their included studies with this checklist; those using the ENTREQ were required to mention adherence to it. Reviews were imported in Endnote (version 9.1) and duplicates were removed. One author (YdJ) screened the titles for obvious irrelevance. Two authors (YdJ and JM) independently selected studies for eligibility based on abstract and full-text; conflicts were resolved after discussion. The selection procedure is explained in more detail in the supplement, section A.

Data-extraction
Our study aimed to assess the uptake of the COREQand ENTREQ-checklists in reviews, but also to explore the effect of the COREQ on the reporting quality of original qualitative studies included in these reviews. For all reviews, we extracted the number of included qualitative studies, studies with mixedmethod designs, and other designs (e.g. quantitative, reviews, etc.). For the first aim, we used the publication date of all the reviews from the meta-data of these reviews, rounded down to the month (i.e. MM/ YYYY); if unavailable, we searched for the earliest publication date in online sources. For the second aim, we extracted the publication year (i.e. YYYY) and the COREQ scores of the original studies included in these reviews, as scored by the authors of these reviews (i.e. we did not rate the studies ourselves, but used the COREQ score as determined by the authors of these reviews, as illustrated in the supplemental Fig. S1). Data were extracted on three levels based on availability of the data: the score at the level of signalling questions (reported or not reported; 0 or 1), the total score per domain (0-8 for domain 1, 0-15 for domain 2, and 0-9 for domain 3), and the overall total score (0-32), where applicable. If no extractable information (e.g. no review COREQ score, but only an average per domain) was available, the corresponding author of that study was contacted. Data extraction was conducted by YdJ, JM, EvdW, and CV; all experienced in qualitative research, and familiar with both the COREQ and ENTREQ checklists.

Statistical analysis
For the first aim, to investigate the uptake of the COREQ and the ENTREQ, we plotted the number of qualitative reviews using these checklists compared to those that did not use it over time, starting from the respective publication dates (i.e. 09-2007 and 11-2012). For the second aim, to assess whether the publication of the COREQ influenced the reporting quality of qualitative studies, we compared the average scores at the three levels (total score, domain scores and signalling questions) before publication of the COREQ (pre-COREQ: all studies before 2007) and after publication of the COREQ (post-COREQ: 2009-2019). Articles published in 2007 and 2008 were excluded, as the COREQ was published in September 2007 and this was regarded as a transition period, see Fig. 1. We used this transition period to avoid inclusion of studies that used a preliminary version of the COREQ (which was presented at a congress prior to publicationpersonal communication with Prof. A. Tong), and also to exclude studies that were in the submission process at the time of the publication date. To visualize the trends of the total COREQ score per domain, we plotted the absolute score over time, using a LOESS curve with a 95% confidence interval, and a span of 0.5. Average scores, as opposed to median scores, were calculated as in similar prior studies [24,25], as this allows comparison on the level of signalling questions, increase precision of the estimated effect, and, though fundamentally different than LOESS modelling, allows comparison to these curves more than median scores. To compare the average scores prior-and post publication, we used unpaired T-tests. As some COREQ scores were missing, analyses were performed on complete cases. A significance level of p ≤ 0.05 was used, which was corrected for multiple testing using the Bonferroni approach. For the COREQ-analyses, we used a significance level of p < 0.0014 (0.05 divided by a total of 36 significance tests: 32 signalling questions, three domains and one for the total COREQ score). Analyses were performed in R, version 1.2.5001.

Sensitivity analyses
We conducted three sensitivity analyses, all related to the second aim. 1) An analysis where we compared the COREQ scores prior-and post-publication without the transition period. 2) An analysis after imputation of missing COREQ scores, since a substantial number of reviews presented an adapted or incomplete COREQ score, usually without explanation. We assumed these missing data to be missing at random (MAR) and conducted five-multiple imputations using the R-package MICE; estimates were pooled according to Rubin's rules.
3) An analysis of the effect of the inclusion of duplicate studies across reviews. Studies were considered a duplicate if the year of publication and name of the first author were identical. A detailed description of the sensitivity analyses is presented in the supplement, section C.

Characteristics of included studies
The three searches resulted in a total of 1.695 eligible reviews: 222 reviews used the COREQ for appraisal of their included studies, 369 used the ENTREQ, 62 reviews used both the COREQ and ENTREQ, and 1.042 used neither the COREQ or ENTREQ (Fig. 2). These 1.695 reviews included a total of 49.281 studies (median 19 studies per review, IQR 12-32), most of which were qualitative (78%, 38.279; median 14 studies per review, . The remaining studies were of mixedmethods (4%; 2.177 studies; median 2 studies per review, IQR 1-4) and other methodology (18%; 8.825 studies; median 11 studies per review, . A summary of the included reviews is presented in Table 1; an overview of all included reviews is given in the Supplement, section D. Characteristics of reviews using the COREQ For the 282 reviews that used the COREQ (i.e. 222 reviews using the COREQ alone; 62 using both COREQ and ENTREQ), most reviews presented their appraisal results in a table (n = 193; 68%), or textual only (n = 37, 13%), or a bar chart (n = 3, 1%). A large number of reviews appraised their included studies with the COREQ, but did not present the results (49 reviews; 17%). A total of 139 (49%) of the 282 reviews presented extractable data from individual studies, which was used to explore the trends in COREQ scores over time. Of these 139 reviews, data were presented at the level of signalling questions for 110 (79%), domains for 12 (9%) and total score for 17 (12%) of the reviews. In total, 2.775 COREQ appraisals of qualitative studies were extracted: 2.448 at the level of signalling questions, 200 at domain score, and 127 at overall total score. In more than half of the reviews, the COREQ checklist was adapted for study purposes (e.g. item exclusion) or COREQ-scores were incompletely reported: 47 out of the 110 reviews that reported at the level of signalling questions scored at least one of their included studies on all 32 signalling questions. The median completeness of the 32 COREQitems was 25 (IQR 23-32; range 1-32), for the completeness of the individual signalling questions, see Table 2. As we used only the complete scores for our analyses (i.e. a complete case analysis), the number of appraisals included in the analysis for COREQ domains 1 to 3 was 1.036, 1.117, 1.086 respectively, and 831 appraisals for the overall total COREQ score.
First aim: trends over time: uptake of COREQ and ENTREQ over time The total number of reviews on qualitative studies increased exponentially over time (Fig. 3A). Until the publication of the COREQ in September 2007, only 31 reviews were identified; this number increased to 141 at the publication of the ENTREQ in November 2012. Of the total of 1.664 reviews published since the COREQ publication, 284 (17%) used the COREQ to assess the reporting quality of their included studies, this proportion remaining stable over time ( Fig. 3B and C). For the ENTREQ, 431 reviews (28%) used this checklist out of the 1.554 reviews published since its publication, with this proportion increasing over time ( Fig. 3B and D).   Table 2, the positive trendline for each of the three domains is visualized in Fig. 4.

Sensitivity analyses
When comparing the COREQ without the transition period, the improvement was less pronounced with 11 out of the 32 signalling questions showing changes after Bonferroni correction (one negative, the others positive;

Discussion
In this meta-review, we explored the uptake of the COREQ-and ENTREQ-checklists in qualitative reviews, and compared the reporting quality of original qualitive studies prior-and post COREQ publication. Though reviews of qualitative research are a novel methodology to achieve abstraction beyond the original qualitative studies, we demonstrated an exponential publication trend over the past twenty years. By including 1.695 reviews, that in turn included 49.281 studies, we were able to present an in-depth overview of current qualitative researchboth at the level of reviews, as well as the level of individual studies included within these reviews. Answering the first research question, we found that the COREQ, published in 2007 to score the quality of reporting of original qualitive studies, was used in 17% of the reviews to appraise the reporting quality of their included studies. The ENTREQ, published in 2012 specifically for systematic reviews, showed a better uptake with 28% of the reviews using the checklist. Finally, using the COREQ-scores of 2.775 studies within these reviews, we demonstrated a positive trend in reporting quality since the publication of the COREQ, with 13 out of the 32 signalling questions showing improvement. The uptake of the COREQ in qualitative reviews may be explained by the original aim of the COREQ, namely to improve quality of reporting in original interview-or focus-group studies [6]. In the absence of a comprehensive checklist for reporting the quality of qualitative reviews, the usage of the COREQ to appraise the reporting quality of studies within reviews may have followed naturally with the increasing numbers of qualitative reviews since its publication. The ENTREQ, specifically designed for reviews, showed a higher uptake [9]. Yet, appraising qualitative studies remains a debated topic. While some argue that adhering to checklists improves transparency and validity of findings, others feel endorsement as a limitation, arguing that a 'one size fits all' -set of criteria cannot encompass the broadness of qualitative research as a whole [5,[26][27][28][29]. In our study, this unresolved debate is clearly illustrated by the large number of reviews that adapted the COREQ for their purposes: more than half of the studies assessed their included studies with a selection of COREQ-items, or combined it with other Table 1 Summary of the 1.695 included qualitative reviews, grouped as COREQ-or ENTREQ using, using both checklists, or using neither checklist. An overview of each included review is presented in the supplement, section D. *Other study design includes all studies that are neither qualitative or mixed methods (e.g. quantitative, reviews, etc.) checklists, both designed for reporting-or overall quality assessment, such as the CASP [30], QualSyst [31], GRADE-CREQual [32], MMAT [33], amongst others.
The incomplete reporting, or the limited uptake of the COREQ and ENTREQ is not unique for qualitative research. For example, impact-studies on guidelines used for quantitative reviews [19], clinical trials [13,34], observational studies [15,16], prediction-or prognostic studies [14,17], show that, even with endorsement of journals, the completeness of reporting remains suboptimal although for some, reporting quality improved. By extracting the COREQ-scores of 2.775 appraisals included in these reviews, we were able to observe changes in the quality of reporting over time. On average, the total score, one of the three domains, and nearly half of the 32 signalling questions showed improvement when comparing studies published prior-versus postpublication of the COREQ. Though causal inferences cannot be made, this improvement, especially viewed in combination with the exponential trend of qualitative review publications, reflects the maturation and increasing acceptance of qualitative research. Although the overall quality of reporting improved, the scores of some items remained remarkably low: 16 out of the 32 signalling questions scored lower than an average score of 0.5. For example, in the first domain ("research team and reflexivity"), the items "experience and training", "relationship established" and "participant knowledge of the interviewer" were reported poorly and did not improve markedly, with an average score of 0.25, 0.18 and 0.16, meaning that only 25, 18 and 16% of the articles reported these items, respectively. For the second domain ("study design"), most items were reported better than in the first domain, and improvements were even stronger. Nearly all items improved, and almost half remained significant after Bonferroni correction for multiple testing. The third domain ("analysis and findings") showed good reporting on nearly all items, except for "software" and "participant checking", though the first showed the largest improvement of all 32 items of the COREQ. These findings are in line with the two other studies that graded qualitative studies for the same purpose: Al-Moghrabi et al graded 100 qualitative studies, and demonstrated poor quality of reporting for most signalling questions [31]. In the second study, Godinho et al confirms this poor completeness of reporting in 246 Indian qualitative studies [24,25]. When plotting the results over time, completeness of reporting remained modest,  [30]. The strengths of this study are the large sample size and comprehensive search methods. We conducted our study on reviews of qualitative studies (i.e. a metareview). This method allowed for exploration of checklist usage in the same study type, namely reviews. Furthermore, the original qualitative studies included in these reviews are independently assessed for reporting quality by the authors of these reviews, assuring independent quality assessment and allowing for a large number of study appraisals to be included. We aimed to include as many studies as possible, tin order to present a comprehensive overview of all qualitative reviews. However, because of this large sample size, we did not perform complete cross-checking at two levels: title selection and data-extraction. We did cross-check the abstract-and full-texts for inclusion, showing excellent agreement (Cohen's kappa coefficient for inter-rater reliability of 0.86 and 1.00 respectively). Data-extraction was crosschecked for 10 reviews, showing no errors. Furthermore, nearly all COREQ-studies could be extracted directly by recoding the COREQ-tables to our format, instead of typing the scores in our datasystem, thus reducing the risk of errors. Next, though misclassification of study type could be a more serious issue (e.g. misclassify a qualitative study design as mixed methods), all authors used the same methodology to classify the study types, as detailed in the supplement. Another limitation related to the COREQ-score is selection bias: studies of higher quality may have been easier to find in databasesearches than those that are of lower quality (e.g. because of the use of identifiable terms as 'thematic synthesis' or 'grounded theory'), possibly resulting in overestimation of the average COREQ scores. Furthermore, some review authors might have excluded studies based on their COREQ-score, which will result in an overestimation of the COREQ scores. Since the publication of the COREQ and ENTREQ, various new checklists have been published, both for appraising the reporting-and the overall study quality (e.g. the CASP in 2013 [30], the SRQR checklist in 2014, the eMERGe in 2019), underlining the developments in this research field since these guidelines. The use of these guidelines might partly explain the limited uptake of the COREQ and the ENTREQ, however we believe this to be to a limited extent since most reviews that did not use the COREQ or ENTREQ did not use any other checklist. Another explanation of the limited uptake may be improved retrievability of the post-COREQ and ENTREQ studies: including terms as 'adhering to', 'appraising', or naming these checklists likely increased the likelihood of inclusion in our review, compared to studies published prior these guidelines. Because of this, we based our search on previous studies [22,23], designed our queries together with an experienced medical librarian, and conducted iterative search methods, and we thus believe this effect to be minimal. Lastly, it cannot be inferred that differences prior-and post-publication of the COREQ and ENTREQ are causally related to the publication of these checklists.

Implications and conclusion
Our study highlights several points that may further improve the quality of reporting. First, surprisingly, almost a fifth of the reviews that used the COREQ did not present the results of their quality appraisal. Given that four out of the 21 ENTREQ-items, but also four of the 27 PRISMA-items concern study appraisal, at least reporting appraisal results should be the minimum.
Ideally however, to facilitate meta-reviews of this kind, and to increase transparency and reproducibility, reporting appraisal results per individual study at the level of signalling questions is essential. Next, though we did not explore the characteristics of the authors of our included reviews, it can reasonably be assumed that the exponential publication trend may be explained by an increasing number of unique authors. Whether or not articles should be scored instead of appraised in a descriptive way remains open for discussion. However, the use of these checklists might be beneficial for new or inexperienced authors designing a qualitative study: checklists may guide those unfamiliar with qualitative research with hints and directions to avoid commonly made mistakes [5,10,27,35]. The same holds true for reviewers assessing a qualitative review for publication, particularly if the reviewer has content expertise but not methodological expertise. A final implication concerns the poor reporting of several signalling questions of the COREQ. Whether or not these items are intentionally or unintentionally underreported, our study clearly points towards items that might either actually improve qualitative research if reported, or be left out from the checklist in a possible later or updated version. By providing this information on a large number of qualitative studies, our study might thus facilitate the ongoing discussions by providing factual data on both the use of checklists, and the completeness of reporting.