Skip to main content

Evaluating the role of quality assessment of primary studies in systematic reviews of cancer practice guidelines



The purpose of this study was to evaluate the role of study quality assessment of primary studies in cancer practice guidelines.


Reliable and valid study quality assessment scales were sought and applied to published reports of trials included in systematic reviews of cancer guidelines. Sensitivity analyses were performed to evaluate the relationship between quality scores and pooled odds ratios (OR) for mortality and need for blood transfusion.


Results found that that whether trials were classified as high or low quality depended on the scale used to assess them. Although the results of the sensitivity analyses found some variation in the ORs observed, the confidence intervals (CIs) of the pooled effects from each of the analyses of high quality trials overlapped with the CI of the pooled odds of all trials. Quality score was not predictive of pooled ORs studied here.


Had sensitivity analyses based on study quality been conducted prospectively, it is highly unlikely that different conclusions would have been found or that different clinical recommendations would have emerged in the guidelines.

Peer Review reports


Quality assessment of trials included in systematic reviews of evidence is a resource intensive and scientifically controversial endeavour. On the one hand, the routine use of quality assessment in the development of systematic reviews is encouraged by the Evidence-based Practice Center Program of the Agency for Healthcare Research & Quality (AHRQ) and the Cochrane Collaboration, two well respected groups that coordinate a substantial number of systematic reviews [13]. Indeed, West et al released a 2002 evidence report sponsored by the AHRQ comparing and contrasting various systems to rate the strength and quality of research evidence to assist in these activities [4]. Furthermore, many journal editors consider it important to include an assessment of study quality in reports featuring meta-analyses [3].

The concept of incorporating study quality assessment into systematic review methodology has also found empirical support. There is evidence that studies of lower methodological quality tend to report larger treatment effects than high quality studies [57]. For example, Moher and his colleagues found a 34% greater estimate of treatment effect for low quality versus high quality trials and a 37% greater estimate of treatment effect for inadequately concealed versus adequately concealed trials associated with reviews addressing a variety of clinical conditions [5]. Similar bias was found by Schultz and his colleagues [6] in their analysis of trials included in the Cochrane Collaboration's Childbirth and Pregnancy reviews. In addition, Colditz and colleagues found that nonrandomized and open studies were more likely to produce positive treatment effects than randomized and double-blinded studies [7].

Although this seminal work yields compelling results, these findings are not universal and the issue is not without detractors [814]. Some studies have found no reliable relationship between quality score and effect size [1012] and another has found that low study quality was associated with diminished effect sizes [13]. Further, Juni et al [14] found the relationship between study quality and effect size depended on the scale used in the assessment.

Together these results suggest the study quality issue is controversial and that the merits of this methodological step in systematic review requires thoughtful analysis. Indeed, West et al conclude with recommendations advocating for research dedicated to comparing quality rating systems and the role of quality assessment within individual clinical contexts and for studies targeted at determining specific quality factors that make a difference in final quality scores [4].

The Practice Guidelines Initiative of the Cancer Care Ontario's Program in Evidence-based Care (PEBC) uses the Guidelines Development Cycle to create cancer practice guidelines comprised of a systematic review of the research literature, an interpretation and consensus of the evidence by members of the guideline development team, clinical recommendations informed by the evidence, and an external review process by Ontario clinicians [1519]. We face the challenge of balancing scientific rigour and the timely production of guideline documents in an environment defined by limited financial and human resources. Hence, we try to approach our methodological decisions with a critical scientific and practical eye. We took note of the growing controversy in the study quality assessment literature and conducted an evaluation, reported below, to evaluate the benefits of assessing the quality of each study included in our systematic reviews. Our overall objective was to decide whether to augment our current practice of simply describing study characteristics to also incorporate study quality assessment as a routine formal component of the guideline development methodology. The evaluation was conducted in three steps, each of which was designed to address three specific issues:

1. What valid and reliable quality assessment instrument would be most appropriate for our context?

2. How is study quality currently being used in published systematic reviews of cancer trials and what is the relationship between effect size and study quality in this disease area?

3. What impact would study quality assessment have on the clinical recommendations made in evidence-based practice guidelines developed by the PEBC?


Search for a valid and reliable quality assessment tool for the PEBC context

For a comprehensive review of the strengths and weaknesses of quality assessment instruments, readers are referred to the 2002 West et al. evidence report commissioned by the AHRQ [4]. For our study, which began before the release of this report, we used components of Moher et al's definition of study quality that are related to internal validity (i.e., design, conduct and analysis) [20] and updated his 1992 published reports of check lists and scales used to measure the phenomenon [21]. We searched the Medline database using the following search strategy: (quality adj rat:).tw OR (quality adj assess:).tw. OR (quality adj scale:).tw. OR (quality adj checklist:).tw. AND randomized controlled OR clinical OR Reference lists of reviews were scanned for additional citations.

Systematic review of the oncology literature on study quality

To locate systematic reviews on oncology topics, the strategy suggested by Moher et al for finding systematic reviews [3] was combined with the terms " OR OR" to search the Medline, CINAHL, and Cancerlit databases. To ascertain if the authors assessed the quality of the studies included in the systematic reviews, the search was narrowed to include the terms [(quality adj rat:).tw OR (quality adj assess:).tw. OR (quality adj scale:).tw. OR (quality adj checklist:).tw. OR (study adj quality).tw.]. Textwords were used to search the Cochrane Library for systematic reviews on oncology topics. Systematic reviews that included analyses exploring the relationship between study quality (using any assessment instrument, not just validated tools that met our criteria as described above) and effect size were examined. Because survival following cancer treatment is commonly used as the primary outcome variable in our practice guidelines, this variable was selected as the primary outcome measure of interest.

Impact of study quality on PEBC practice guidelines

The validated scales were applied by two methodologists (MJ and MC) to studies reported in any our practice guidelines that included a pooled analysis based on at least ten randomized trials related to the main guideline question. Intraclass correlation coefficients with 95% confidence intervals (CI) were calculated using a random sample of the RCTs to assess inter-rater reliability, one coefficient calculated for each of the scales used. Because of budgetary limitations for staff time, the analysis was conducted on 18 randomly selected studies rather on the whole group of articles. This is a methodological limitation as fewer studies result in larger confidence intervals and less precise estimates. This may account for the difference in reliability ratings we found for the Sindu scale compared to published norms (see Table 1).

Table 1 Quality assessment tools with published validity and reliability data – information reported by the scale developers

To assess the impact of study quality on effect sizes, sensitivity analyses were conducted for the meta-analysis from each guideline report. For each scale, studies were divided into two groups (low quality and high quality) based on total quality score. Where the scale developer suggested a cut-off point for low versus high quality, this was used. Where no cut point was specified, the observed median study quality score was used as the dividing point between low and high quality. Meta-analyses were repeated with the high quality studies. Because there would never be a situation in which guideline developers would consider low quality studies only, a meta-analysis using this sample of the studies was not conducted.


Valid and reliable quality assessment tools for our context

Four scales meeting our criteria were found; two instruments, Jadad et al [22] and Cho & Bero [23], were originally uncovered in the Moher review [21] and two instruments, Sindhu et al [24] and Downs & Black [25], were uncovered in our update of this review. While none of these scales were developed in the oncology setting, they all purport to be generic assessment tools that measure the quality of specific study designs regardless of clinical condition reflected in the design. The procedures undertaken to create the instruments followed appropriate methodological processes for questionnaire design. In addition, while a number of additional scales and checklists emerged from our search, validity and reliability data were not reported. Because our practice guidelines are based primarily on evidence from randomized trials, we decided to reserve to employ the scales that focused specifically on RCTs. As such, the Cho & Bero scale, which is applicable to a range of study designs, was not employed here but will be considered at a later date when we have a portfolio of diverse study designs. The characteristics of the instruments included in our study are summarized in Table 1 and 2 and detailed descriptions and comparisons can be found in West et al. [4].

Table 2 Quality assessment tools – comparison of key quality constructs

The relationship between study quality and effect size in the oncology literature

The literature review located 32 published systematic reviews on oncology-related topics that included some measure of study quality. Five of the reviews examined changes in pooled estimates of effect size of mortality rates when meta-analysis was restricted to high-quality randomized trials [2630]. As shown in Table 3, four of the five reviews found somewhat larger effects (i.e., larger differences between experimental and control groups) with high-quality trials compared to all trials [2629]. With one exception, the statistical relevance of the differences between the groups (i.e., significant differences or no significant differences) remained the same regardless of the number of trials included. Specifically, two of the reviews did not detect a statistically significant difference in survival between groups when all studies were included or when the meta-analysis was restricted to high-quality studies [26, 29]. For one data set, the meta-analysis was repeated with study quality ratings used as weights [29]; there was still no significant difference between experimental and control groups. Two analyses detected significant differences between experimental and control treatments with analysis of all trials and when the analysis was restricted to high-quality trials [27, 28]. In the fifth review, a significant difference between experimental and control interventions was detected when all trials were synthesized that became only marginally significant (p < .07) when the meta-analysis was adjusted for study quality [30].

Table 3 Systematic reviews of randomized oncology trials with sensitivity analysis exploring the relationship between study quality scores and effect sizes for mortality

Impact of study quality on PEBC practice guidelines

Three of the PEBC practice guidelines included at least 10 RCTs in their systematic reviews of the evidence and were eligible for inclusion in this evaluation [3133]: concomitant chemotherapy and radiotherapy in squamous cell head and neck cancer (18 trials) [31]; adjuvant therapy for stage II colon cancer following complete resection (11 trials) [32]; and neoadjuvant chemotherapy in locally advanced squamous cell carcinoma of the head and neck (23 trials) [33]. For the latter guideline [33], data could not be reliably reconstructed and is not discussed further.

At the conclusion of our study, we identified a fourth practice guideline which originally did not meet our 10 RCT inclusion criteria, but later did so after it was updated. The guideline focused on the role of erythropoietin (EPO) in the management of cancer patients with non-hematologic malignancies [34]. Unlike the chemotherapy trials included in the practice guidelines described above, which were not placebo-controlled and where the primary outcome was death, one-third of the EPO trials were double blind and all used the need for blood transfusion as the primary outcome. Although by the time this practice guideline emerged as eligible we had identified a preferred scale (see below), we chose to include it here and apply only the preferred scale as a demonstration of its use on a report that had differing characteristics than the chemotherapy topics covered.

Inter-rater reliability

Intraclass correlation coefficients used to established inter-rater reliability were 0.71 for the 3-item Jadad scale (95% CI, 0.38 to 0.88), 0.80 for the 6-item Jadad scale (95% CI, 0.54 to 0.92), 0.62 for the Sindhu scale (95% CI, 0.24 to 0.84), and 0.63 for the Downs & Black Scale (95% CI, 0.25 to 0.84). Disagreements were resolved by consensus. Where consensus could not be reached, a third rater (MB) assessed the items and provided the tiebreaker score.

Application of quality scales to primary studies informing practice guidelines

While the total quality scores emerging from each of the different scales did all significantly correlate with one another (range r = .35 to r = .73), there was considerable variation in the classification of studies as high quality or low quality as a function of the scale that was applied (Table 4). For example, of the 11 comparisons from 11 trials comprising the stage II colon cancer review, the application of the Jadad 3-item, the Jadad 6-item, the Sindhu, and Downs & Black scales yielded 0, 8, 9, and 6 of these as high quality, respectively. The 6 studies categorized as high quality using the Downs & Black tool were also categorized as high quality when the Jadad 6-item and Sindhu scales were applied. Similarly, the 8 studies categorized as high by the Jadad 6-item were also categorized as high quality by the Sindhu scale.

Table 4 Meta-analysis of all trials and high-quality trials from evidence-based practice guidelines

The 20 comparisons from the 18 trials included in the head and neck concomitant therapy systematic review yielded 2, 14, 12 and 14 high quality studies, respectively, when the Jadad 3-item, the Jadad 6-item, the Sindhu and the Downs & Black scales was used. Although both Jadad 6-item and Downs & Black scales both assessed 14 comparisons to be from high quality studies, only 11 of these 14 studies were the same. For the 12 comparisons from studies categorized as high quality with the Sindhu scale, 10 of these were also rated high quality by both the Jadad 6-item and the Downs & Black scales, the other 2 were rated as high quality by the Jadad 6-item scale only. There was 1 comparison from a study rated as high quality by the Jadad 6-item scale only and two from studies rated as high quality by the Downs & Black scale only.

Impact on pooled estimates of outcome measures

Mortality data (i.e., numbers of deaths and number of patients randomized for each allocation group, abstracted from published trial reports) used for the meta-analysis included in the guideline reports were available for two guidelines and need for blood transfusion data were available for the third [31, 32, 34]. For each guideline, the pooled odds ratio based on only the high-quality trials was compared with the odds ratio from meta-analysis of all trials that had been included originally in the review (Table 4). For the first guideline [31], there was a significant survival benefit for concomitant chemotherapy and radiotherapy compared with radiotherapy alone for squamous cell head and neck cancer in the meta-analyses that included all studies and the meta-analyses restricted to high quality studies, regardless of quality appraisal tool used. Although the effect size was larger for meta-analysis of high-quality RCTs than for all RCTs (irrespective of quality scale used), the confidence intervals between the two calculations overlapped and the overall conclusions and the recommendations informed by the meta-analysis would have been the same. For the second guideline [32], no survival benefit was detected for adjuvant chemotherapy compared to standard therapy for stage II colon cancer in the meta-analysis of all the studies or the high quality studies, again, regardless of quality appraisal tool used. Although the meta-analysis of the high study quality studies was associated with smaller effect sizes than the calculation including all of the studies, the confidence intervals overlapped and the conclusions and the recommendations would have remained the same.

Only the 6-item Jadad scale was applied to the studies of the EPO guideline and the data were pooled to calculate an overall risk ratio for blood transfusion [34]. The risk ratio for all 15 trials was 0.57 (95% CI, 0.47 to 0.70); for nine trials that scored more than three out of eight on the 6-item Jadad scale, the risk ratio was also 0.57 (95% CI, 0.44 to 0.72) (see Table 4).


Several conclusions can be drawn from this study and review of the literature. First, there are established methods for assessing the quality of randomized controlled trials in which data on adequate reliability and validity were available. West et al uncovered 32 scales, check lists and component systems concerned with evaluating RCTs [4]; more than the four strategies we applied here. Although most (87%) of the instruments found by West included quality domains for which there is an empirical basis, most failed to report the use rigorous methods in their development and most failed to report data regarding reliability and validity, criteria we set for our study. Interestingly, West et al did not include the Jadad 6-item in their analysis [4], although the Jadad 3-item, Downs & Black, and Sindhu tools were reported.

Although all of the scales we used have established reliability and validity estimates, we found that the number of trials categorized as high quality or low quality depended specifically on the scale that was applied. For the head and neck cancer systematic review, the number comparisons from high quality studies ranged from 2 (when the Jadad 3-item scale was applied) to 14 (when the Jadad 6-item or Downs & Black scales were applied). The range for the colon cancer review was 0 to 9. There was also considerable variability regarding the specific quality category in which each trial was placed. These finding are consistent with those of Juni et al [14] and suggest caution should be applied if the intent of quality rating scales is to restrict the number of studies considered in the systematic review; clearly the choice of scale will have a significant impact regarding what studies are eligible. The problem of identifying to which quality category, high or low, studies should be placed is exacerbated by the lack of clear cut-off criteria identified by the instrument developers. This poses a significant methodological limitation to the utility of these instruments. In our study, we chose the median score as the cut-off criteria in situations where none was reported. However, it would be useful for researchers of these tools to continue the development work to create the evidence-base from which valid criteria can be established.

The lack of consistency of study classification from one scale to the next and the lack of clear cut-off criteria for users to employ when measuring quality of studies, presents a challenge to guideline developers when they need to make choice about which instrument they ought to adopt if the choose to adopt an instrument at all. Rather than clear evidence driving our decisions, we considered other features of the instrument in our decision making. Of the rating scales we examined, our preferred choice would be the Jadad 6-item instrument. In contrast to the others considered in this report, this instrument is relatively easy to implement and interpret and good inter-rater reliability was established. Further, although the 3-item version of the Jadad scale is most commonly used, we found the original 6-item version to be more relevant in our clinical context as it provides greater variation in scores. In the cancer discipline, few trials are placebo-controlled and treatment allocation tends to be poorly reported. In contrast to the pain trials which were profiled in the development phase of the Jadad instrument, the majority of the items in the 3-item version (randomization and blinding items) yield no variation in scores in our context and are, therefore, not useful to discriminate among cancer trials. The 6-item version of the scale more aptly differentiates quality across studies and includes more quality domains for which an empirical basis has been established [4].

Another conclusion that can be drawn from this study is that effect size can be related to study quality but that the nature of the relationship in one clinical area may not generalize to another clinical area. Some of the original work examining the role of study quality reinforces the need to be mindful of the variation among studies included in systematic review [57]. However, when we examined five published reviews that had conducted sensitivity analyses on pooled mortality data from RCTs, four of these found that larger effect sizes were associated with high-quality studies, not lower quality trials as has been convention, and the absence or presence of statistical differences between the two allocation groups remained constant. One of challenges in examining this work is that the number of high quality studies is limited; there is a reduction in power that subjects the point estimates to bias. Nonetheless, the potential bias of study design and quality requires thoughtful consideration within a given clinical field.

We conducted sensitivity analysis on the systematic reviews comprising the guidelines developed by the PEBC. Only four systematic reviews among 36 eligible practice guidelines included more than 10 trials with data appropriate for pooling; three from which we could extract data. Although there was some variation in the odds ratios observed, the confidence intervals of the pooled effects from each of the analyses of high quality trials overlapped with the confidence intervals of the pooled odds with all of the trials. In no case would the conclusions based on these results be affected by restricting the meta-analysis to only high quality studies; the recommendations remained the same. Had sensitivity analysis based on study quality been conducted prospectively, it is highly unlikely that different conclusions would have been drawn from the systematic review or that different clinical practice guidelines would have been formulated.

Together, these findings lead to our final conclusion that measuring study quality did not translate into altered conclusions from a systematic review in the oncology domain for the outcomes we used here. Thus, at this time we have decided that measuring study quality using a numerical assessment scale for the purposes of sensitivity analysis will not be a routine part of our guideline development program. We will, however, encourage guideline developers to describe the variation among studies and to point out methodologic flaws. In addition, it will be important for us to repeat this study looking at other outcome measures, such as quality of life and adverse effects, as they become more routinely reported in primary cancer research and incorporated into our practice guidelines. Outcomes other than those studied here may be more sensitive to the issues of study quality.

This study highlights a strategy that may be useful for guideline programs to utilize in making decisions regarding the methods employed in their guideline development process. It is important that scientific inquiry be maintained in studying the value and role of study quality assessment rather than accepting its role as convention. By exploring it within a specific clinical context one can identify it's most appropriate application.


  1. Lohr KN, Carey TS: Assessing "best evidence": issues in grading the quality of studies for systematic reviews. Jt Comm J Qual Improv. 1999, 25: 470-479.

    CAS  PubMed  Google Scholar 

  2. Clarke M, Oxman AD, editors: Quality assessment of studies. Cochrane Reviewers Handbook 4.1.2 [updated March 2001]; Section 6. The Cochrane Library. 2001, Oxford: Update Software, Updated quarterly, 2

  3. Moher D, Cook DJ, Jadad AR, Tugwell P, Moher M, Jones A, Pham B, Klassen TP: Assessing the quality of reports of randomised trials: implications for the conduct of meta-analyses. Health Technol Assess. 1999, 3: 1-98. i-iv

    Google Scholar 

  4. West S, King V, Carey TS, Lohr KN, McKoy N, Sutton SF, Lux L: Systems to Rate the Strength of Scientific Evidence. Evidence Report/Technology Assessment No. 47 (Prepared by the Research Trial Institute – University of North Carolina Evidence-based Practice Center under Contrast No. 290-97-0011). AHRQ Publication No. 02-E016. 2002, Rockville, MD: Agency for Healthcare Research and Quality

    Google Scholar 

  5. Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen TP: Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses?. Lancet. 1998, 352: 609-613. 10.1016/S0140-6736(98)01085-X.

    Article  CAS  PubMed  Google Scholar 

  6. Schulz KF, Chalmers I, Hayes RJ, Altman DG: Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA. 1995, 273: 408-412. 10.1001/jama.273.5.408.

    Article  CAS  PubMed  Google Scholar 

  7. Colditz GA, Miller JN, Mosteller F: How study design affects outcomes in comparisons of therapy. Stat Med. 1989, 8: 441-466.

    Article  CAS  PubMed  Google Scholar 

  8. Ioannidis JP, Lau J: Can quality of clinical trials and meta-analyses be quantified?. Lancet. 1998, 352: 590-591. 10.1016/S0140-6736(98)22034-4.

    Article  CAS  PubMed  Google Scholar 

  9. Greenland S: Quality scores are useless and potentially misleading. Am J Epidemiol. 1994, 140: 300-301.

    Google Scholar 

  10. Emerson JD, Burdick E, Hoaglin DC, Mosteller F, Chalmers TC: An empirical study of the possible relation of treatment differences to quality scores in controlled randomized clinical trials. Control Clin Trials. 1990, 11: 339-352. 10.1016/0197-2456(90)90175-2.

    Article  CAS  PubMed  Google Scholar 

  11. Verhagen AP, de Vet HC, Vermeer F, Widdershoven JWMG, de Bie RA, Kessels AGH, Boers M, van den Brandt PA: The influence of methodologic quality on the conclusion of a landmark meta-analysis on thrombolytic therapy. Int J Technol Assess Health Care. 2002, 18 (1): 11-23.

    PubMed  Google Scholar 

  12. Balk EM, Bonis PAL, Moskowitz H, Schmid CH, Ioannidis JPA, Wang C, Lau J: Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA. 2002, 287 (22): 2973-2982. 10.1001/jama.287.22.2973.

    Article  PubMed  Google Scholar 

  13. Verhagen AP, de Vet HC, de Bie RA, Lenssen AF, Kessels AG, Boers M, van den Brandt P: Impact of quality items on study outcome: treatments in acute lateral ankle sprains. Conference Proceedings of the First Symposium on Systematic Reviews: Beyond the Basics. 1998, Oxford, [abstract]

    Google Scholar 

  14. Juni P, Witschi A, Bloch R, Egger M: The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999, 282 (11): 1054-1060. 10.1001/jama.282.11.1054.

    Article  CAS  PubMed  Google Scholar 

  15. Browman GP, Levine MN, Mohide EA, Hayward RS, Pritchard KI, Gafni A, Laupacis A: The practice guidelines development cycle: a conceptual tool for practice guidelines development and implementation. J Clin Oncol. 1995, 13: 502-512.

    CAS  PubMed  Google Scholar 

  16. Browman GP, Newman TE, Mohide EA, Graham I, Levine MN, Cowan DH: Progress of Clinical Oncology Guidelines Development Using the Practice Guidelines Development Cycle: The Role of Practitioner Feedback. J Clin Oncol. 1998, 16 (3): 1226-1231.

    CAS  PubMed  Google Scholar 

  17. Browman G, Brouwers M, De Vito C, Johnston M, Graham I: Participation Patterns of Oncologists in the Development of Clinical Practice Guidelines. Curr Oncol. 2000, 7 (4): 252-257.

    Google Scholar 

  18. Pater JL, Browman GP, Brouwers MC, Nefsky MF, Evans WK, Cowan DH: Funding New Cancer Drugs in Ontario: Closing the loop in the Practice Guidelines Development Cycle. J Clin Oncol. 2001, 19 (14): 3392-3396.

    CAS  PubMed  Google Scholar 

  19. Browman GP: Development and aftercare of clinical guidelines: the balance between rigor and pragmatism. JAMA. 2001, 286: 1509-1511. 10.1001/jama.286.12.1509.

    Article  CAS  PubMed  Google Scholar 

  20. Moher D, Jadad AR, Tugwell P: Assessing the quality of randomized controlled trials. Current issues and future directions. Int J Technol Assess Health Care. 1996, 12: 195-208.

    Article  CAS  PubMed  Google Scholar 

  21. Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S: Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials. 1995, 16: 62-73. 10.1016/0197-2456(94)00031-W.

    Article  CAS  PubMed  Google Scholar 

  22. Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, McQuay HJ: Assessing the quality of reports of randomized clinical trials: is blinding necessary?. Control Clin Trials. 1996, 17: 1-12. 10.1016/0197-2456(95)00134-4.

    Article  CAS  PubMed  Google Scholar 

  23. Cho MK, Bero LA: Instruments for assessing the quality of drug studies published in the medical literature. JAMA. 1994, 272: 101-104. 10.1001/jama.272.2.101.

    Article  CAS  PubMed  Google Scholar 

  24. Sindhu F, Carpenter L, Seers K: Development of a tool to rate the quality assessment of randomized controlled trials using a Delphi technique. J Adv Nurs. 1997, 25: 1262-1268. 10.1046/j.1365-2648.1997.19970251262.x.

    Article  CAS  PubMed  Google Scholar 

  25. Downs SH, Black N: The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health. 1998, 52: 377-384.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. McAlister FA, Clark HD, Wells PS, Laupacis A: Perioperative allogeneic blood transfusion does not cause adverse sequelae in patients with cancer: a meta-analysis of unconfounded studies. Br J Surg. 1998, 85: 171-178. 10.1046/j.1365-2168.1998.00698.x.

    Article  CAS  PubMed  Google Scholar 

  27. Caubet JF, Tosteson TD, Dong EW, Naylon EM, Whiting GW, Ernstoff MS, Ross SD: Maximum androgen blockade in advanced prostate cancer: a meta-analysis of published randomized controlled trials using nonsteroidal antiandrogens. Urology. 1997, 49: 71-78. 10.1016/S0090-4295(96)00325-1.

    Article  CAS  PubMed  Google Scholar 

  28. Dube S, Heyen F, Jenicek M: Adjuvant chemotherapy in colorectal carcinoma: results of a meta-analysis. Dis Colon Rectum. 1997, 40: 35-41.

    Article  CAS  PubMed  Google Scholar 

  29. Detsky AS, Naylor CD, O'Rourke K, McGeer AJ, L'Abbe KA: Incorporating variations in the quality of individual randomized trials into meta-analysis. J Clin Epidemiol. 1992, 45: 255-265. 10.1016/0895-4356(92)90085-2.

    Article  CAS  PubMed  Google Scholar 

  30. Klein S, Simes J, Blackburn GL: Total parenteral nutrition and cancer clinical trials. Cancer. 1986, 58: 1378-1386.

    Article  CAS  PubMed  Google Scholar 

  31. Browman GP, Hodson DI, Mackenzie RJ, Bestic N, Zuraw L: Choosing a concomitant chemotherapy and radiotherapy regimen for squamous cell head and neck cancer: A systematic review of the published literature with subgroup analysis. Head Neck. 2001, 23: 579-589. 10.1002/hed.1081.

    Article  CAS  PubMed  Google Scholar 

  32. Figueredo A, Germond C, Maroun J, Browman G, Walker-Dilks C, Wong S, the Gastrointestinal Cancer Disease Site Group: Adjuvant therapy for stage II colon cancer following complete resection. Cancer Prev Control. 1997, 1: 379-392.

    CAS  PubMed  Google Scholar 

  33. Browman GP, Hodson DI, Newman T, the Head and Neck Cancer Disease Site Group: Neoadjuvant chemotherapy in locally advanced squamous cell carcinoma of the head and neck (excluding nasopharynx). Cancer Care Ontario Practice Guidelines Initiative Web site. 15 August 2001, []

  34. Quirt I, Micucci S, Moran LA, Pater J, Browman G, the Systemic Treatment Disease Site Group : Erythropoietin in the management of cancer patients with non-hematologic malignancies receiving chemotherapy. Cancer Prev Control. 1997, 1: 241-248.

    CAS  PubMed  Google Scholar 

Pre-publication history

Download references


We would like to thank Cancer Care Ontario and the Ontario Ministry of Health and Long-Term Care for their financial support.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Melissa C Brouwers.

Additional information

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

MB, MJ and GB conceived of the project idea and developed the protocol. MB, MJ, and MC conducted by study. SH provided statistical advice and AJ provided conceptual advice. The manuscript was drafted initially by MB and MJ. All authors contributed to the final version submitted for publication.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brouwers, M.C., Johnston, M.E., Charette, M.L. et al. Evaluating the role of quality assessment of primary studies in systematic reviews of cancer practice guidelines. BMC Med Res Methodol 5, 8 (2005).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: