- Research article
- Open Access
Database selection and data gathering methods in systematic reviews of qualitative research regarding diabetes mellitus - an explorative study
BMC Medical Research Methodology volume 21, Article number: 94 (2021)
Systematic reviews (SRs) are considered one of the most reliable types of studies in evidence-based medicine. SRs rely on a comprehensive and systematic data gathering, including the search of academic literature databases. This study aimed to investigate which combination of databases would result in the highest overall recall rate of references when conducting SRs of qualitative research regarding diabetes mellitus. Furthermore, we aimed to investigate the current use of databases and other sources for data collection.
Twenty-six SRs (published between 2010 and 2020) of qualitative research regarding diabetes mellitus, located through PubMed, met the inclusion criteria. References of the SRs were systematically hand searched in the six academic literature databases CINAHL, MEDLINE/PubMed, PsycINFO, Embase, Web of Science, and Scopus and the academic search engine Google Scholar. Recall rates were calculated using the total number of included references retrieved by the database or database combination divided by the total number of included references, given in percentage.
The SRs searched five databases on average (range two to nine). MEDLINE/PubMed was the most commonly searched database (100% of SRs). In addition to academic databases, 18 of the 26 (69%) SRs hand searched the reference lists of included articles. This technique resulted in a median (IQR) of 2.5 (one to six) more references being included per SR than by database searches alone. 27 (5.4%) references were found only in one of six databases (when Google Scholar was excluded), with CINAHL retrieving the highest number of unique references (n = 15). The combinations of MEDLINE/PubMed and CINAHL (96.4%) and MEDLINE/PubMed, CINAHL, and Embase (98.8%) yielded the highest overall recall rates, with Google Scholar excluded.
We found that the combinations of MEDLINE/PubMed and CINAHL and MEDLINE/PubMed, CINAHL, and Embase yielded the highest overall recall rates of references included in SRs of qualitative research regarding diabetes mellitus. However, other combinations of databases yielded corresponding recall rates and are expected to perform comparably. Google Scholar can be a useful supplement to traditional scientific databases to ensure an optimal and comprehensive retrieval of relevant references.
Systematic reviews (SRs) are thorough reviews of the literature on a clearly outlined research question and are considered one of the most reliable types of studies in evidence-based medicine. Investigators are advised to search multiple academic databases and reference lists when conducting SRs [1, 2]. The Cochrane Handbook recommends searching at least Cochrane Central, MEDLINE and Embase as well as applying the MEDLINE search strategy, which should include a) a term for the health condition of interest, b) the intervention for evaluation, and c) the study design when conducting SRs of randomized controlled trials (RCTs) . For qualitative evidence synthesis, Cochrane suggests using purposive sampling instead of the exhaustive approaches for quantitative research and recommends placing extra emphasis on searching for grey literature and in local databases . An alternative to the traditional academic literature databases are academic search engines such as Google Scholar and Microsoft Academic Search. These search engines are free of charge and “crawl” the internet for relevant academic literature rather than search peer-reviewed published literature within a database. As a result, academic search engines can also find grey literature (documents not published by commercial publishers) such as academic theses and organization reports, reducing possible publication bias in a SR.  The process of searching through multiple academic databases and search engines can be tedious, as each has its own interface and requires separate search strings. For example, Boolean operators, phrase searching, truncation and use of parentheses can all differ between databases . Therefore, to improve search quality, an information specialist’s involvement is generally recommended [2, 5, 6]. Investigators are naturally very interested in how many databases are necessary to achieve a suitable number of references when conducting a SR. However, it is equally important to know which databases will give the broadest search results and highest likelihood of unique references i.e. references not found elsewhere within a given field. These questions have been investigated earlier in qualitative research in general terms , within the field of depression  as well as in quantitative research [9,10,11,12,13,14,15] with one study exploring diabetes mellitus . However, no previous studies have investigated SRs of qualitative research regarding diabetes mellitus. Diabetes mellitus is one of the most frequent chronic diseases in the twenty-first century, with a global prevalence estimated at 463 million people (9.3%) in 2019 and an estimated increase to 700 million (10.9%) by 2045 . It is a disease that demands rigorous and comprehensive care, as patients need to control diet, exercise, medication and health check-ups with podiatrists, ophthalmologists and general practitioners or endocrinologists. At the same time, the patients do not necessarily sense the symptoms of the disease. Therefore, compliance is a substantial problem for this patient group , resulting in high occurrences of complications. It is essential to understand the barriers concerning the patients’ compliance. Unlike quantitative studies, qualitative studies offer an opportunity to understand the clinicians’, caregivers’, relatives’ and, most importantly, the patients’ point of view. In the field of diabetes mellitus, qualitative studies give insight into measures successful in maintaining compliance and impacting the lives of patients. This study aimed to investigate which combination of academic literature databases and academic search engines would result in the highest recall rate of references, when conducting SRs of qualitative research regarding diabetes mellitus. Furthermore, we aimed to investigate the current use of academic literature databases and search engines (hereafter jointly referred to as databases), information specialists and additional data gathering methods.
Inclusion and exclusion of SRs
SRs of qualitative research regarding diabetes mellitus were retrieved from the PubMed database for all entries before the day of inclusion (January 25, 2021). The search terms (“Qualitative Research” [MeSH]) and (“Diabetes Mellitus” [MeSH]) were combined using the Boolan operator “AND”, and the filter “Systematic reviews” was applied. Despite not applying language restrictions, the search only yielded English results. Likewise, no restriction to the year of publication was applied. SRs were systematically full text evaluated according to the inclusion and exclusion criteria. Inclusion criteria were SRs of either qualitative or mixed methods (both qualitative and quantitative) research regarding all subtypes of diabetes mellitus. Exclusion criteria were a) lack of a full list of databases searched for data collection in the SR, b) included references not extractable through the reference list or supplementary data, c) SRs which focused solely on other diseases than diabetes mellitus or d) SRs only quantitative in nature. Details about collected variables from each SR and the references are summarized in Table 1.
Inclusion and exclusion of references from SRs
A list of all included references was extracted from each SR. Each reference was evaluated on whether it met the inclusion and exclusion criteria. Inclusion criterion was references included in one of the included SRs. Exclusion criteria were a) quantitative references included in mixed methods SRs, b) references of diseases other than diabetes mellitus included in SRs of multiple diseases and c) unpublished references. Figure 1 illustrates the inclusion process of the SRs and their references. All references were systematically hand searched in seven databases CINAHL, MEDLINE/PubMed, PsycINFO, Embase, Web of Science, Scopus, and Google Scholar. Social Science Citation Index (SSCI) was not investigated in this study, as it is one of six databases already included in Web of Science. MEDLINE, and PubMed were treated as one database because PubMed includes all MEDLINE references . The references were initially searched by title. If the title search did not retrieve the reference, further searches, initially using the basic search functions and later using keywords, authors, and journals, were completed. For each reference, it was documented whether the reference was found and in which of the databases.
The number and frequency of databases searched were described in absolute numbers and mean, median, and interquartile range (IQR). Calculations for correlation of the number of databases and year published were performed using Poisson regression. Searches of reference lists were described in absolute numbers as well as median and range. The contribution of references from each individual database and the various combinations of their combined contribution were calculated as absolute numbers and recall or combined recall. Recall rates were calculated using the total number of included references retrieved by the database or database combination divided by the total number of included references, given in percentage. All calculations of recall rates and unique number of references per database were performed firstly with all seven databases and secondly with Google Scholar excluded, since Google Scholar’s precision in structured literature searches has been reported to be low despite high recall rates (the topic of which is further addressed in the Discussion section) [22, 23]. All statistical analyses were performed using RStudio for Windows (v. 4.0.2 RStudio v. 1.3.1093).
Inclusion and exclusion of SRs and references
The initial search of PubMed, with the search syntax defined in the methods section, yielded a result of 35 SRs. Nine SRs were excluded, the process of which is detailed in Fig. 1. A total of 26 SRs met the inclusion criteria and were included in the study. All SRs included were published between 2010 and 2020. No correlations were found between the year of publication, and the number of databases searched. See Appendix 1 for an overview of included SRs. The 26 SRs contained a total of 707 references. Five references could not be extracted as two SRs included 85 references in total, but only listed 80 references in the reference lists. Two hundred one references were excluded (see Fig. 1 for further details), and 501 unique, qualitative studies concerning diabetes mellitus were included. A median (IQR) of 12.5 (6 to 24) references were included per SR.
Databases and their frequency of use
The mean and median number of databases searched by the SRs were five and four, respectively, with a range from two to nine databases (Fig. 2).
The 26 SRs searched 28 different databases, of which 12 were reported more than once. MEDLINE/PubMed was the most searched database applied by all SRs (100%), followed by CINAHL, which was searched by 21 out of the 26 SRs (81%). Embase and PsycINFO were the third and fourth most searched databases, both searched by 12 SRs (46%) (Fig. 3).
The use of information specialists and additional sources
Only one (4%) SR  involved an information specialist when choosing databases. Another SR  used a search filter developed by an information specialist, while the remaining 24 SRs did not mention using an information specialist. Eighteen of the 26 (69%) SRs searched reference lists of included articles (two of the 18 SRs did not present the results of these searches). This resulted in a median (IQR) of 2.5 (one to six) more references being included per SR than by database searches alone. In total, the 16 SRs included 48 references from searching reference lists of included articles. These 48 references are included in the total number of 501 references. Three SRs exclusively searched databases, while the remaining SRs, in addition to databases, hand searched journals, key authors, and other sources.
Unique references per database
The seven databases MEDLINE /PubMed, CINAHL, Embase, PsycINFO, Web of Science, Scopus, and Google Scholar were investigated individually. A total of 9 (1.8%) references were unique to only one of these seven databases. Table 2 shows the number of unique references for each database. Embase retrieved the highest number of unique references followed by CINAHL, Google Scholar and MEDLINE/PubMed. The databases were also investigated excluding Google Scholar, and in this case, CINAHL retrieved the highest number of unique references (n = 15), followed by Embase (n = 5), and MEDLINE/PubMed (n = 5).
Search of databases and their overall recall
For each database and their combinations, the recall rates of the 501 individual references were calculated. The calculations are shown in Table 3. Google Scholar showed the highest overall recall rate (97%), and Scopus, the second-highest overall recall rate (92%). The seven databases had overall recall rates between 39 to 97%. The combination of Google Scholar and Embase retrieved the highest overall recall rate (99%) regarding combinations of two databases. Excluding Google Scholar, the combination of two databases with the highest overall recall rate was MEDLINE/PubMed and CINAHL with 96%. The combination of three databases with the highest overall recall rate was Google Scholar, Embase, and either MEDLINE/PubMed (99.6%) or CINAHL (99.6%). Excluding Google Scholar, the combination of three databases with the highest overall recall rate was MEDLINE/PubMed, Embase, and CINAHL (98.8%).
Our study underlines the importance of choosing the optimal combination of databases when conducting a qualitative SR regarding diabetes mellitus. It has previously been suggested that a SR must include at least 95% of the publications on any given subject to be acceptable . We found that the combinations of MEDLINE/PubMed and CINAHL (96.4%) and MEDLINE/PubMed, CINAHL, and Embase (98.8%) yielded the highest overall recall rates (when combining two and three databases, respectively), with Google Scholar excluded from the analyses. However, other combinations of databases yielded corresponding recall rates and are expected to perform comparably. Furthermore, CINAHL retrieved the highest number of unique references (n = 15), followed by MEDLINE/PubMed (n = 5), and Embase (n = 5), when Google Scholar was excluded. Based on these findings, we recommend searching at least the combination of MEDLINE/PubMed and CINAHL, when conducting qualitative SRs regarding diabetes mellitus (applied by 20 of 26 SRs).
These results contrast a previous study concluding that the combination of Scopus, CINAHL and ProQuest Dissertations and Thesis Global (hereafter referred to as ProQuest) contributed to the highest number of unique references for qualitative SRs . ProQuest was not investigated in this study, as unpublished references were excluded. However, this only comprised two references and ProQuest would therefore not be expected to retrieve a high number of unique references. In alignment with previous findings [7, 26], our data showed that CINAHL retrieved the highest number of unique references (excluding Google Scholar), therefore suggesting CINAHL to be highly relevant when searching literature for qualitative diabetes mellitus research. CINAHL focuses on nursing and allied health research, a content that may be too narrow when researching multi- or interdisciplinary health science literature. In these cases, multidisciplinary databases such as Scopus and Web of Science could prove higher yielding . Therefore, the nature of the research questions should be carefully considered when deciding the optimum combination of databases.
Google Scholar had the highest individual overall recall rate in this study (97%) and adds further value with the identification of grey literature . However, databases such as ProQuest and GreySource offer similar access to grey literature. Despite the advantages of Google Scholar, its precision in structured literature searches has previously reported to be low [22, 23]. Google Scholar has many significant limitations, including search expressions being limited to 256 characters, displaying a maximum of 1000 results of the complete results without explaining how the order of results has been made and no bulk export options. Therefore, this search engine has previously been assessed as inadequate as a standalone resource for data gathering, when conducting comprehensive search activities, such as SRs . We recommend Google Scholar be used as a supplement to the traditional scientific database searches in order to enhance retrieval of unique or unpublished references.
The majority of SRs searched reference lists of the included articles, similar to previous findings . It can be argued that a comprehensive search in the optimal combination of databases would render the search of reference lists redundant, which our data on overall recall rates supports. However, whether a reference is present in a database does not directly translate into it being found with a given search string. In conclusion, searching reference lists is a valid way of searching for additional references not found by database searches alone. Two of the 26 SRs used either an information specialist or a search filter developed by an information specialist. These results contradicts a prior quantitative study that reported 51% of SRs used a librarian, though only 64% of these SRs actually reported this use . Although, our findings are insufficient in making recommendations, we recommend consulting an information specialist before conducting database searches due to the challenge of each database requiring different search strings.
MEDLINE and PubMed were treated as one database due to the major overlap of references to avoid misleading results. However, it might be relevant to treat them as independent databases when conducting academic literature searches. PubMed includes all MEDLINE references as well as up-to-date citations, books and book chapters, and references from journals not indexed in MEDLINE, such as PMC journals [14, 29]. The larger quantity of content in PubMed compared to MEDLINE (91% of PubMed content is indexed in MEDLINE ,) might contribute to more relevant references when conducting a SR. On the other hand, PMC literature has been criticized for potentially reducing the quality of PubMed, due to its informal reevaluation process (prior to 2017), though most manuscripts in PMC are also published in MEDLINE indexed journals . For these reasons, we recommend the use of PubMed over MEDLINE.
This study has several limitations. Firstly, the SRs included in this study were found through the database PubMed. Other databases were not searched for SRs of qualitative research regarding diabetes mellitus. Secondly, the search string solely used MeSH terms, and because of this, may not have recovered all qualitative SRs of diabetes mellitus in the PubMed database contributing to selection bias. However, as this is an exploratory study and not a SR or meta-analysis, a sample of collectable data was assessed to be sufficient. Thirdly, since we only investigated the topic of SRs of qualitative research regarding diabetes mellitus, our results may not apply to other diseases or topic of research. Fourthly, not all databases were investigates in this study such as SSCI and British Nursing Index, which were the sixth and eighth most frequently searched databases. It is possible that combinations including these databases may have resulted in different conclusions. Fifthly, whether a reference is present in a database does not directly translate into whether it would have been found using a given search string. Therefore, our results may not be directly transferable to the search of references when conducting a SR. Sixthly, the recall rates in this study were derived from the references included in the SRs and not from the actual number of references available and relevant for the same SRs at their time of inclusion. There were likely relevant references on qualitative research on diabetes mellitus not included in the SRs. These references, if included in our study, might alter the results and recommendations for database selection.
We found that the combinations of MEDLINE/PubMed and CINAHL (96.4%) and MEDLINE/PubMed, CINAHL, and Embase (98.8%) yielded the highest overall recall rates (when Google Scholar was excluded from the analyses) of references included in SRs of qualitative research regarding diabetes mellitus. Other combinations of databases did, however, yield corresponding recall rates and are expected to perform comparably. Google Scholar can be a useful supplement to traditional scientific databases to ensure an optimal and comprehensive retrieval of relevant references, both academic and grey literature. Further research on the subject should try to establish whether our findings within the field of diabetes mellitus are similar to other disease areas within qualitative research.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Cumulative Index to Nursing and Allied Health Literature
Chronic Obstructive Pulmonary Disorders
Human Immunodeficiency Virus
Randomized controlled trial
Social sciences citation index
Web of science
Muka T, Glisic M, Milic J, Verhoog S, Bohlius J, Bramer W, et al. A 24-step guide on how to design, conduct, and successfully publish a systematic review and meta-analysis in medical research. Eur J Epidemiol. 2020;35(1):49–60. https://doi.org/10.1007/s10654-019-00576-5.
Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.2 (updated February 2021). Cochrane. 2021. Available from www.training.cochrane.org/handbook.
Haddaway NR, Collins AM, Coughlin D, Kirk S. The role of google scholar in evidence reviews and its applicability to grey literature searching. Plos One. 2015;10(9):1–17.
Bramer WM, de Jonge GB, Rethlefsen ML, Mast F, Kleijnen J. A systematic approach to searching: an efficient and complete method to develop literature searches. J Med Libr Assoc. 2018;106(4):531–41. https://doi.org/10.5195/jmla.2018.283.
Koffel JB. Use of recommended search strategies in systematic reviews and the impact of librarian involvement: a cross-sectional survey of recent authors. PLoS One. 2015;10(5):e0125931.
Rethlefsen ML, Farrell AM, Osterhaus Trzasko LC, Brigham TJ. Librarian co-authors correlated with higher quality reported search strategies in general internal medicine systematic reviews. J Clin Epidemiol. 2015;68(6):617–26. https://doi.org/10.1016/j.jclinepi.2014.11.025.
Frandsen TF, Gildberg FA, Tingleff EB. Searching for qualitative health research required several databases and alternative search strategies: a study of coverage in bibliographic databases. J Clin Epidemiol. 2019;114:118–24. https://doi.org/10.1016/j.jclinepi.2019.06.013.
Wright JM, Cottrell DJ, Mir G. Searching for religion and mental health studies required health, social science, and grey literature databases. J Clin Epidemiol. 2014;67(7):800–10. https://doi.org/10.1016/j.jclinepi.2014.02.017.
Bramer WM, Rethlefsen ML, Kleijnen J, Franco OH. Optimal database combinations for literature searches in systematic reviews: a prospective exploratory study. Syst Rev. 2017;6(1):1–12.
Hartling L, Featherstone R, Nuspl M, Shave K, Dryden DM, Vandermeer B. The contribution of databases to the results of systematic reviews: a cross-sectional study. BMC Med Res Methodol. 2016;16(1):1–13.
Vassar M, Yerokhin V, Sinnett PM, Weiher M, Muckelrath H, Carr B, et al. Database selection in systematic reviews: an insight through clinical neurology. Health Inf Libr J. 2017;34(2):156–64. https://doi.org/10.1111/hir.12176.
Halladay CW, Trikalinos TA, Schmid IT, Schmid CH, Dahabreh IJ. Using data sources beyond PubMed has a modest impact on the results of systematic reviews of therapeutic interventions. J Clin Epidemiol. 2015;68(9):1076–84. https://doi.org/10.1016/j.jclinepi.2014.12.017.
Aagaard T, Lund H, Juhl C. Optimizing literature search in systematic reviews - are MEDLINE, EMBASE and CENTRAL enough for identifying effect studies within the area of musculoskeletal disorders? BMC Med Res Methodol. 2016;16(1):1–11.
Frandsen TF, Eriksen MB, Hammer DMG, Christensen JB. PubMed coverage varied across specialties and over time: a large-scale study of included studies in Cochrane reviews. J Clin Epidemiol. 2019;112:59–66. https://doi.org/10.1016/j.jclinepi.2019.04.015.
Frandsen TF, Brandt M, Mortan D, Hammer G, Buck J, Albert J. Using Embase as a supplement to PubMed in Cochrane reviews differed across fields. J Clin Epidemiol. 2021;133:24–31. https://doi.org/10.1016/j.jclinepi.2020.12.022.
Royle P, Bain L, Waugh N. Systematic reviews of epidemiology in diabetes: finding the evidence. BMC Med Res Methodol. 2005;5(2):1–6.
Saeedi P, Petersohn I, Salpea P, Malanda B, Karuranga S, Unwin N, et al. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9(th) edition. Diabetes Res Clin Pract. 2019;157:107843.
Cramer JA. A systematic review of adherence with medications for diabetes. Diabetes Care. 2004;27(5):1218–24. https://doi.org/10.2337/diacare.27.5.1218.
Vanstone M, Giacomini M, Smith A, Brundisini F, DeJean D, Winsor S. How diet modification challenges are magnified in vulnerable or marginalized people with diabetes and heart disease: a systematic review and qualitative meta-synthesis. Ont Health Technol Assess Ser. 2013;13(14):1–40.
DeJean D, Giacomini M, Vanstone M, Brundisini F. Patient experiences of depression and anxiety with chronic disease: a systematic review and qualitative meta-synthesis. Ont Health Technol Assess Ser. 2013;13(16):1–33.
Williamson PO, Minter CIJ. Exploring PubMed as a reliable resource for scholarly communications services. J Med Libr Assoc. 2019;107(1):16–29. https://doi.org/10.5195/jmla.2019.433.
Boeker M, Vach W, Motschall E. Google Scholar as replacement for systematic literature searches: good relative recall and precision are not enough. BMC Med Res Methodol. 2013;13:131.
Shultz M. Comparing test searches in PubMed and Google scholar. J Med Libr Assoc. 2007;95(4):442–5. https://doi.org/10.3163/1536-5050.95.4.442.
Messina J, Campbell S, Morris R, Eyles E, Sanders C. A narrative systematic review of factors affecting diabetes prevention in primary care settings. PLoS One. 2017;12(5):e0177699.
Vanstone M, Rewegan A, Brundisini F, Giacomini M, Kandasamy S, Dejean D. Diet modification challenges faced by marginalized and nonmarginalized adults with type 2 diabetes: a systematic review and qualitative meta-synthesis. Chronic Illn. 2017;13(3):217–35. https://doi.org/10.1177/1742395316675024.
Wright K, Golder S, Lewis-Light K. What value is the CINAHL database when searching for systematic reviews of qualitative studies? Syst Rev. 2015;4(1):1–8.
Salisbury L. Web of science and Scopus: a comparative review of content and searching capabilities. Charlest Advis. 2009;11(1):5–18.
Page MJ, Shamseer L, Altman DG, Tetzlaff J, Sampson M, Tricco AC, et al. Epidemiology and reporting characteristics of systematic reviews of biomedical research: a cross-sectional study. PLoS Med. 2016;13(5):e1002028.
US National Library of Medicine. MEDLINE, PubMed and PMC (PubMed Central) - How are they different?. Fact Sheet. 2016. p. 1. Available from: https://www.nlm.nih.gov/pubs/factsheets/dif_med_pub.html. [cited 2021 Mar 4]
The authors thank Caroline Margaret Moos for her assistance in the editing of the manuscript.
The authors did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Furthermore, no sponsors were involved in conduct of the research or preparation of the article.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Justesen, T., Freyberg, J. & Schultz, A.N.Ø. Database selection and data gathering methods in systematic reviews of qualitative research regarding diabetes mellitus - an explorative study. BMC Med Res Methodol 21, 94 (2021). https://doi.org/10.1186/s12874-021-01281-2
- Qualitative research
- Systematic review
- Literature search
- Diabetes mellitus