Skip to main content
  • Correspondence
  • Open access
  • Published:

Checklist for the qualitative evaluation of clinical studies with particular focus on external validity and model validity



It is often stated that external validity is not sufficiently considered in the assessment of clinical studies. Although tools for its evaluation have been established, there is a lack of awareness of their significance and application. In this article, a comprehensive checklist is presented addressing these relevant criteria.


The checklist was developed by listing the most commonly used assessment criteria for clinical studies. Additionally, specific lists for individual applications were included. The categories of biases of internal validity (selection, performance, attrition and detection bias) correspond to structural, treatment-related and observational differences between the test and control groups. Analogously, we have extended these categories to address external validity and model validity, regarding similarity between the study population/conditions and the general population/conditions related to structure, treatment and observation.


A checklist is presented, in which the evaluation criteria concerning external validity and model validity are systemised and transformed into a questionnaire format.


The checklist presented in this article can be applied to both planning and evaluating of clinical studies. We encourage the prospective user to modify the checklists according to the respective application and research question. The higher expenditure needed for the evaluation of clinical studies in systematic reviews is justified, particularly in the light of the influential nature of their conclusions on therapeutic decisions and the creation of clinical guidelines.

Peer Review reports


It is known that clinical studies can generate discordant results. This observation is addressed scientifically in various ways. Deviant study results may be understood as an expression of spreading or scattering from a supposed true value (whereas deviation depends on the precision of the methods). An alternative approach is to explain differences not statistically but by way of content [1]. In considering individual studies, there should be an estimate to what extent the study conclusions are distorted by systematic factors of bias. Here the focus lies usually on so called internal validity, the comparability of test and control groups. (Detailed definitions of internal validity and other validity categories are given in the methods section). When assessing internal validity a differentiation is made between the following factors:

Selection bias: differences between test and control population regarding their structural composition, e.g. in terms of age, gender, duration and severity of illness and others.

Performance bias: differences in the treatment apart from the intervention tested, e.g. more contact, attention or efforts in the verum group.

Detection bias: differences in observation of outcome parameters, e.g. due to inadequate blinding and respective expectations by assessors, due to training effects or others.

Attrition bias: related to differences in dropouts between test and control group.

The goal is to gain the largest possible level of structural, treatment-related and observational similarity between test and control groups through randomisation and blinding, with a subsequent evaluation following the "intention to treat" (ITT) principle [13]. Studies with relative good avoidance of selection, performance, attrition and detection bias, in relation to the test and control populations, are classified as internally valid. Scoring systems have been developed to support the evaluation of internal validity (e.g., the Jadad Score) [46] and assessment criteria of internal validity are also reflected in the EBM hierarchy of study types [79]. In contrast, aspects of external validity that refer to the comparability between the study population and the general population of interest are often neglected in quality assessment and are usually not considered as having a possible distortive effect on an article's conclusion [10, 11]. Rothwell stated in 2005 [11]: "There is concern among clinicians that external validity is often poor [...]. Yet researchers, funding agencies, ethics committees, the pharmaceutical industry, medical journals, and governmental regulators alike all neglect external validity, leaving clinicians to make judgments. However, reporting of the determinants of external validity in trial publications and systematic reviews is usually inadequate [...]."

Factors that can lower the representativeness of a study population and thus the external validity are for example:

Process of consenting: patients who give their consent to participate have been shown to differ largely in severity of illness and other parameters to those who do not give their consent [12, 13].

Consenting and selection criteria: Emmerich et al. [14] interpret the fact that only 7–8% of possible study participants were included in a study in that way that the study population was highly selected, well motivated with good levels of compliance and better probable outcomes than the "real-life" patients. The most frequent exclusion criteria were relative contraindications to the study intervention and refusal of participants.

Patients' preferences: Protheroe et al. [15] showed that the discrepancy between clinical guidelines and their practical application becomes larger when patient preferences are considered. According to their decision analysis only 60% of patients with atrial fibrillation had preferred anticoagulation, which was far less than those who would have been recommended by guidelines (up to 90%). When interpreting data on patients' preferences one should consider that answers in questionnaires or interviews are often discordant with actual decisions.

Furthermore, commonly neglected factors that limit the validity of study results, according to Rothwell [11], are as varied as differences in health care systems, national characteristics and regulations, characteristics of the participating centres and the level of physicians' specialisation (for example, being limited to "special care units").

Regarding such contextual differences one should also distinguish, on the one hand, services' ability to deliver and, on the other hand, clients' uptake and potential to benefit.

Other factors include the choice of outcomes: surrogate parameters, e.g. laboratory values instead of clinical values, and relevant parameters for the patient (general and mental health, emotional balance, vitality and quality of life), all of which are seldom charted in randomised controlled trials (RCTs).

Rothwell suggests that there should be a stronger consideration of external validity criteria in the evaluation of clinical studies, even in guidelines such as the CONSORT [16] or Cochrane Collaboration guidelines [2]. This issue was taken up by Glasgow and colleagues [17] in 2005. Concrete proposals for assessing generalisability in trials of health care interventions were made by Bonell et al. in 2006 [18].

The tools required to evaluate external validity are, in principle, not new – the relevant criteria have been used in methodology lectures for medical students, and are found in many guidelines for the evaluation of clinical studies. It seems, however, that there is a deficit in both the awareness of the actual necessity for this evaluation process and in the actual application of the assessment criteria.

With this article, we present a checklist that encompasses the most important quality assessment criteria regarding external validity and model validity criteria. These criteria have been systematised and have been formulated in operable questions.


The checklist has been developed by listing the most commonly used assessment criteria for clinical studies [24, 9, 11, 16, 1938] and by using specific criteria lists for individual applications. These include, for example: surgical interventions [39], so-called practical clinical studies which are characterised particularly by a larger amount of heterogeneity of population, intervention and outcome criteria [40], observational studies [41], single case analyses of oncology patients [28], the aforementioned criteria regarding external validity published by Rothwell [11], and model validity published by Wein [37], and our own assessment criteria: We extrapolated key elements from internal validity to external validity, adapting them where necessary. We integrated questions from the above mentioned lists into the scheme of external validity and added criteria derived from the practical experience of the authors (clinical as well as methodological experts). We tested the checklist on two occasions when performing systematic reviews [42, 43].

The systemisation of the criteria has been carried out using the "PICOS" categories (Population, Intervention, Control, Outcome, Setting), and by using the assessment categories regarding internal validity, external validity, model validity, and general study quality. In the following only the essential aspects of external and model validity are pursued.


The term "internal validity" (IV) refers to the "confidence that the trial design, conduct, and analysis has minimized or avoided biases in its treatment comparison" [44] and is considered as "a measure of the strength of the association between exposure or intervention and outcome within a study" [9]. Internal validity relates to all comparisons made between the test intervention and the controls, not only in RCTs.

The term "external validity" (EV) refers to generalisability (i.e. the extent to which the effects observed in a study truly reflect what can be expected in a target population beyond the people included in the study [2]), which includes the possibility to transfer and apply study results to a distinct population/decision and patient's situation. The most important criteria are conformity with everyday practice and clinical relevance. Difficulties in assessing EV derive from the point that the target population and target setting – for which the study claims to be valid – is commonly not described explicitly. The so-called everyday practice or everyday efficacy is sometimes hard to define as well. Moreover this outer context may change with time (e.g. mutation of infectious agents).

A good external validity in a sense of an adequate reflection of reality (is it correct?) does not necessarily mean a good (external) utility in a sense of a useful reflection of reality (what is it good for? e.g. in terms of patients' quantity and quality of life)

The term "model validity" indicates the concordance between the study design and an ideal setting, e.g. the "state of the art" procedures (see Wein [37]).

The differentiation between EV and MV is not very wide spread. The distinction between everyday conditions and ideal conditions becomes important when switching the focus from the confirmation of an efficacy in principle to the question of a broader application of an intervention ("everyday effectiveness"). In the first, it is important to have ideal conditions such as well trained and highly experienced therapists, a population which is supposed to be very sensitive to the intervention, outcome parameters that reflect the intervention effect the best and a setting that ensures an optimal compliance (e.g. application of a medication by intravenous infusion in a hospital instead of oral application at home). In the second, factors such as practicability of an intervention (e.g. by GPs), accessibility for patients to an intervention, frequency of concomitant diseases and medications, which may be contraindications to the intervention, patients' and therapists' preferences and others become more important.

It is often assumed that statements or conclusions concerning the efficacy are solely related to IV, and EV can only be used to generate statements concerning the extent of validity (or limits of generalisation). However, we take the position that insufficient MV and also EV can distort statements concerning the efficacy/effectiveness. For this reason, the possibilities of bias, in analogy to the IV, have been carried over into the categories of EV and MV. The principle of this extrapolation is shown in table 1, where the contrasting aspects between internal and external validity in respect of the above mentioned bias factors is compiled.

Table 1 System of bias factors, which may affect internal and external validity


The checklists for assessing external and model validity are compiled in table 2 and 3.

Table 2 Questions for assessing external validity (EV)
Table 3 Questions for assessing model validity (MV)

To answer the questions regarding the EV and the MV, certain information should be collected (table 4).

Table 4 Information to be collected for ascertaining reference values for external validity (EV) and model validity (MV)

Beside the use in a sequential form as seen in table 2 and 3 one can also consider a parallel form (Figure 1).

Figure 1
figure 1

Table 5

The complete questionnaires (including those for internal validity and general study quality) can be obtained by authors.


With this compilation of important parameters for MV and EV we propose a checklist, which on the one hand can be used for planning and on the other hand for evaluation of clinical studies. We would like to stress that adjustments or even more extensive modifications can be necessary according to the concrete questions of interest. According to our experience in most studies only a few aspects are crucial for the quality of the validities, while others are only of marginal importance. Some studies may lose their significance and relevance due to one single crucial error while other studies will not despite several but less important parameters judged as insufficient. Establishing and using scores harbours the risk of pseudo-accuracy. Therefore we rather suggest a descriptive evaluation, where scores should only be used to verify one's own evaluation.

The parameters necessary for the evaluation of the EV and MV should be discussed for each research question and application individually. The validity of data needed to determine these criteria is another crucial point. We recommend to avail oneself of the principles of maximal and minimal contrasting as they have worked out well in qualitative research strategies: to look for perspectives on a chosen item as different as possible for maximal contrast (e.g. therapists, methodological experts, patients and relatives in respect to a special disease) and to look for at least 2 representatives of each perspective for minimal contrasting. (General perspectives would be those of bearing responsibility for a decision/deed, implementing it and being affected by it). Gathering the data can be done by questionnaires or structured interviews using the items of the checklists (table 4). As for the validity of these collected data it appears adequate from a pragmatic point of view to consider congruent answers as reliable and deduct the reference data from them, whereas incongruent answers require further analysis. Published data on epidemiology or about clinical studies should be included in the process of compiling reference data. It can be expected that with more thorough consideration of the criteria for EV and MV in study designs future data will have higher validity. As a further result of systematic collected data according to a checklist gaps of knowledge may become evident that could possibly be addressed by additional investigations or studies.

When applying criteria of IV, EV and MV mostly not all of these criteria will be fulfilled to the same extent. That means that studies will usually not be "optimal". Which aspect will be prioritised depends on the question of the study. In the systematic reviews we performed using the checklist for EV and MV [41, 42] we identified other studies as being of high quality than using criteria of IV alone. Most of the studies only considered aspects of IV. In one review [42] the assessment of effectiveness changed in favour to the treatment when prioritising aspects of external validity.

An explanatory study investigating causal connections (e.g. efficacy) will focus on IV although EV and MV should not be neglected, whereas in health-care research the presented aspects of EV should be of primary importance. To obtain a high IV or MV the study population should be as homogeneous as possible, while in evaluating EV it is of great interest to what extent the intervention is also applicable among a heterogeneous population and under heterogeneous conditions, particularly with concomitant diseases and co-medications. Homogeneity within a group is usually attained by restrictive inclusion and exclusion criteria, homogeneity and comparability between groups by randomisation. With regard to IV it is the best method since randomisation is the only adequate means to reduce the risk of the unequal distribution of unknown confounding factors. EV is, as presented above, with high probability affected by the randomisation [11, 14, 15].

Furthermore, it can be assumed that the MV (which may be already distorted through the selection process alone) will be impaired since the ethically and methodically requested prerequisites for the randomisation – the so-called equipoise, i.e. the unbiased position of the investigator in respect of intervention and control – may not be sufficiently fulfilled. Great experience (high MV!) presumably comes along with therapist's preference for a certain intervention, which may interfere with the required neutrality towards the treatment options. To consider the therapy preferences of the physician and the patient within a study corresponds to a high MV and EV.

A study design satisfying the need of IV and EV could be a 4-arm study, in which two arms represent the respective preference for the test or the control intervention – being an open and not blinded intervention – while the other two arms representing the randomised, blinded trial with genuine equipoise. Further possibilities are studies with change-to-open-label (COLA) design [24, 45] or propensity score analyses; a very high EV is also associated with the formation and evaluation of medical registers. A particular ethical problem regarding the equipoise exists in placebo controlled studies, where patients should in principle have the confidence to receive the best therapy and not solely to be used for the gain of knowledge (see in addition also Horrobin [46]). Strictly speaking, to warrant the equipoise only physicians who consider the treatment-free "therapy" or placebo application to be a justified therapeutic option should carry out placebo-controlled studies.

An intervention within a study may be altered, e.g. by individual dose modification or accompanying treatments, satisfying the needs of the everyday life reality. The intervention itself is seen as needed, but not necessarily as sufficient in the individual case. Study designs suitable for these settings are "pragmatic controlled clinical trials" [21, 31, 32], which are, however, deficient in IV.

Naturally, the question arises whether the expenditure to apply the presented checklist is justified. First of all we want to emphasise that from this checklist's systematised compilation not all aspects will need to be addressed for a particular research question and that they also are, though deliberately, partly redundant. Therefore, the expenditure in the actual application will be lessened.

When using the checklist in the process of study planning to decide which aspects should or should not be considered the already strenuous effort of this process may only slightly increase. When applying the checklist for the evaluation of clinical studies, however, the expenditure is much more time consuming compared to other, at the present used, evaluation methods (e.g. Jadad score). However, it appears to be justified to do so considering the expenditure in regard to personnel and funding and in regard to the (psychological) strain for patients to participate in a study. Studies may otherwise be excluded from a further evaluation in a meta-analysis or a systematic review and may not be considered for generating guidelines for more or less formal reasons; or they will be included due to their high IV despite a low EV. Particularly with respect to the generation of guidelines, which have or should have a large influence on the decisions about the therapy, the relevant factors can not be weighted carefully enough. Furthermore, it could be expected that the acceptance of guidelines will be substantially higher in clinical application, if in the planning of the studies aspects of external validity were already considered.


IV, EV and MV are important parameters when assessing clinical studies. Since EV and MV tend to be often neglected we have created a comprehensive checklist addressing the different types of validity. The checklist can be applied to both, planning and evaluating clinical studies and can be modified according to the actual research question. It is our hope that this checklist will enhance the consideration of particularly EV and MV in clinical trials.


  1. Greenhalgh T: How to read a paper: Papers that summarise other papers (systematic reviews and meta-analyses). BMJ. 1997, 315: 672-675.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Alderson P, Green S, Higgins JPT: Cochrane Reviewers’ Handbook 4.2.1 [updated December 2003]. The Cochrane Library. 2004, Chichester , John Wiley & Sons Ltd.

    Google Scholar 

  3. Greenhalgh T: Assessing the methodological quality of published papers. Bmj. 1997, 315 (7103): 305-308.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Chalmers TC, Smith H, Blackburn B, Silverman B, Schroeder B, Reitman D, Ambroz A: A method for assessing the quality of a randomized control trial. Control Clin Trials. 1981, 2 (1): 31-49. 10.1016/0197-2456(81)90056-8.

    Article  CAS  PubMed  Google Scholar 

  5. Jadad AR, Moher M, Browman GP, Booker L, Sigouin C, Fuentes M, Stevens R: Systematic reviews and meta-analyses on treatment of asthma: critical evaluation. Bmj. 2000, 320 (7234): 537-540. 10.1136/bmj.320.7234.537.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, McQuay HJ: Assessing the quality of reports of randomized clinical trials: is blinding necessary?. Control Clin Trials. 1996, 17 (1): 1-12. 10.1016/0197-2456(95)00134-4.

    Article  CAS  PubMed  Google Scholar 

  7. AHCPR: Acute pain management in adults: operative procedures. Agency for Health Care Policy and Research. Clin Pract Guidel Quick Ref Guide Clin. 1992, 1-22.

    Google Scholar 

  8. Ollenschläger G, Helou A, Lorenz W: Kritische Bewertung von Leitlinien. Lehrbuch evidenzbasierte Medizin in Klinik und Praxis Schriftenreihe Hans Neuffer Stiftung. Edited by: Kunz R, et al. 2000, Köln , Deutscher ÄrzteVerlag, 156-176.

    Google Scholar 

  9. SIGN 50 (Scottish Intercollegiate Guidelines Network): A guideline developer's handbook. 2001, Edinburgh

    Google Scholar 

  10. Matthiessen PF: Die Therapieentscheidung des Arztes. Z ärztl Fortbildg Qual Gesundwes. 2005, 99: 269-273.

    Google Scholar 

  11. Rothwell PM: External validity of randomised controlled trials: "to whom do the results of this trial apply?". Lancet. 2005, 365 (9453): 82-93. 10.1016/S0140-6736(04)17670-8.

    Article  PubMed  Google Scholar 

  12. Al-Shahi R, Vousden C, Warlow C: Bias from requiring explicit consent from all participants in observational research: prospective, population based study. Bmj. 2005, 331: 942-946. 10.1136/bmj.38624.397569.68.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Junghans C, Feder G, Hemingway H, Timmis A, Jones M: Recruiting patients to medical research: double blind randomised trial of "opt-in" versus "opt-out" strategies. Bmj. 2005, 331: 940-943. 10.1136/bmj.38583.625613.AE.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Emmerich J, Le Heuzey JY, Bath PMW, Connolly SJ: Indication for antithrombotic therapy for atrial fibrillation: reconciling the guidelines with clinical practice. Eur Heart J Suppl. 2005, 7: C28-33. 10.1093/eurheartj/sui017.

    Article  CAS  Google Scholar 

  15. Protheroe J, Fahey T, Montgomery AA, Peters TJ: The impact of patients' preferences on the treatment of atrial fibrillation: observational study of patient based decision analysis. Bmj. 2000, 320 (7246): 1380-1384. 10.1136/bmj.320.7246.1380.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, Gotzsche PC, Lang T: The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med. 2001, 134 (8): 663-694.

    Article  CAS  PubMed  Google Scholar 

  17. Glasgow RE, Magid DJ, Beck A, Ritzwoller D, Estabrooks PA: Practical clinical trials for translating research to practice: design and measurement recommendations. Med Care. 2005, 43 (6): 551-557. 10.1097/01.mlr.0000163645.41407.09.

    Article  PubMed  Google Scholar 

  18. Bonell C, Oakley A, Hargreaves J, Strange V, Rees R: Assessment of generalisability in trials of health interventions: suggested framework and systematic review. Bmj. 2006, 333: 346-349. 10.1136/bmj.333.7563.346.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Clark JP: Qualitative research review guidelines – RATS. Modified for BioMed Central. []

  20. Gill P, Dowell AC, Neal RD, Smith N, Heywood P, Wilson AE: Evidence based general practice: A retrospective study of interventions in one training practice. Bmj. 1996, 312: 819-821.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Godwin M, Ruhland L, Casson I, MacDonald S, Delva D, Birtwhistle R, Lam M, Seguin R: Pragmatic controlled clinical trials in primary care: the struggle between external and internal validity. BMC Med Res Methodol. 2003, 3 (1): 28-10.1186/1471-2288-3-28.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Heusser P: Problem von Studiendesigns mit Randomisation, Verblindung und Placebogabe. Forsch Komplementarmed. 1999, 6: 89-102. 10.1159/000021215.

    Article  CAS  PubMed  Google Scholar 

  23. Heusser P: Kriterien zur Beurteilung des Nutzens von komplementärmedizinischen Methoden. Forsch Komplementarmed Klass Naturheilkd. 2001, 8: 14-23. 10.1159/000057190.

    Article  CAS  PubMed  Google Scholar 

  24. Hogel J, Walach H, Gaus W: Change-to-Open-Label Design. Proposal and discussion of a new design for clinical parallel-group double-masked trials. Arzneimittelforschung. 1994, 44 (1): 97-99.

    CAS  PubMed  Google Scholar 

  25. Hornung J, Bartsch U, Schreiber O: Kriterienkatalog für die methodische Qualität klinischer Therapieprüfungen, Teil 1. Forsch Komplementarmed. 1994, 1 (1): 44-49.

    Article  Google Scholar 

  26. Khan KS, ter Riet G, Popay J, Nixon J, Kleijnen J: STAGE II – Conducting the review. PHASE 5 – Study quality assessment. Undertaking Systematic Reviews of Research on Effectiveness. Edited by: Khan KS, ter Riet G, Glanville J, Sowden AJ, Kleijnen J. 2001, CRD Report Number 4 (2nd Edition)

    Google Scholar 

  27. Kiene H: Komplementäre Methodenlehre der klinischen Forschung - Cognition-based Medicine. 2001, Berlin Heidelberg , Springer Verlag

    Chapter  Google Scholar 

  28. Kienle GS, Hamre HJ, Portalupi E, Kiene H: Improving the quality of therapeutic reports of single cases and case series in oncology--criteria and checklist. Altern Ther Health Med. 2004, 10 (5): 68-72.

    PubMed  Google Scholar 

  29. Moher D, Soeken K, Sampson M, Ben-Porat L, Berman B: Assessing the quality of reports of systematic reviews in pediatric complementary and alternative medicine. BMC Pediatr. 2002, 2 (1): 3-10.1186/1471-2431-2-3.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Moher D, Soeken K, Sampson M, Ben-Porat L, Berman B: Assessing the quality of reports of randomized trials in pediatric complementary and alternative medicine. BMC Pediatr. 2002, 2 (1): 2-10.1186/1471-2431-2-2.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Resch K: Pragmatic Randomised Controlled Trials for Complex Therapies. Forsch Komplementarmed. 1998, 5 Suppl S1: 136-139. 10.1159/000057335.

    PubMed  Google Scholar 

  32. Roland M, Torgerson DJ: Understanding controlled trials: What are pragmatic trials? . Bmj. 1998, 316 (7127): 285-

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Roland M, Torgerson D: Understanding controlled trials: what outcomes should be measured?. Bmj. 1998, 317 (7165): 1075-

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Sackett DL: Bias in analytic research. J Chronic Dis. 1979, 32: 51-63. 10.1016/0021-9681(79)90012-2.

    Article  CAS  PubMed  Google Scholar 

  35. Sackett D, Richardson WS, Rosenberg W, Haynes RB: Evidence Based Medicine. How to practice and teach EBM. 1997, New York, Edinburgh, London , Churchill Livingstone

    Google Scholar 

  36. ter Riet G, Kessels AG: Validity checklist for clincal trials. Complement Ther Med. 1997, 5: 116-118. 10.1016/S0965-2299(97)80010-6.

    Article  Google Scholar 

  37. Wein C: Qualitätsaspekte klinischer Studien zur Homöopathie. 2002, Essen , KVC Verlag

    Google Scholar 

  38. World Medical Association: Declaration of Helsinki. Ethical principles for medical research involving human subjects. http://wwwwmanet/e/policy/b3htm (2004) and Bull World Health Organ. 2001, 79 (4): 373-374.

    Google Scholar 

  39. Millat B, Borie F, Fingerhut A: Patient's preference and randomization: new paradigm of evidence-based clinical research. World J Surg. 2005, 29 (5): 596-600. 10.1007/s00268-005-7920-z.

    Article  PubMed  Google Scholar 

  40. Tunis SR, Stryer DB, Clancy CM: Practical clinical trials: increasing the value of clinical research for decision making in clinical and health policy. Jama. 2003, 290 (12): 1624-1632. 10.1001/jama.290.12.1624.

    Article  CAS  PubMed  Google Scholar 

  41. Tooth L, Ware R, Bain C, Purdie DM, Dobson A: Quality of reporting of observational longitudinal research. Am J Epidemiol. 2005, 161 (3): 280-288. 10.1093/aje/kwi042.

    Article  PubMed  Google Scholar 

  42. Bornhöft G, Maxion-Bergemann S, Matthiessen PF: Die Rolle der externen Validität bei der Beurteilung klinischer Studien zur Demenzbehandlung mit Ginkgo-biloba-Extrakten. Z Gerontol Geriatr. 2006, (accepted for publication)-

    Google Scholar 

  43. Bornhöft G, Wolf U, von Ammon K, Righetti M, Maxion-Bergemann S, Baumgartner S, Thurneysen A, Matthiessen PF: Effectiveness, safety and cost-effectiveness of homeopathy in general practice. Forsch Komplementärmed. 2006, 13 (Suppl2): 19-29. 10.1159/000093586.

    Article  Google Scholar 

  44. Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S: assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Controlled Clinical Trials. 1995, 16: 62-73. 10.1016/0197-2456(94)00031-W.

    Article  CAS  PubMed  Google Scholar 

  45. Walach H: Das "Change-to-open-label" (COLA)-Design: Anpassung und Veränderung des Parallelgruppen-Blinddesigns für die klinische Forschung. Z Klin Psychol. 1994, 23 (3): 213-218.

    Google Scholar 

  46. Horrobin DF: Are large clinical trials in rapidly lethal diseases usually unethical?. Lancet. 2003, 361 (9358): 695-697. 10.1016/S0140-6736(03)12571-8.

    Article  PubMed  Google Scholar 

Pre-publication history

Download references


We would like to thank Dr. Ted Drell and Dr. Vera Kalitzkus for revising the English text.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Gudrun Bornhöft.

Additional information

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

Details of contributors:

GB: conception, design, analysis, interpretation, writing; SMB and UW: conception, revising article; GSK: analysis, revising article; AM: interpretation from clinical point of view, revising article; HCV: interpretation from general practitioner's point of view, methodological aspects, revising article; SG: interpretation from qualitative researcher's point of view, writing article; PFM: conception, revising article. All authors read and approved the final manuscript.

Gudrun Bornhöft, Stefanie Maxion-Bergemann, Ursula Wolf, Gunver S Kienle, Andreas Michalsen, Horst C Vollmar, Simon Gilbertson and Peter F Matthiessen contributed equally to this work.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Bornhöft, G., Maxion-Bergemann, S., Wolf, U. et al. Checklist for the qualitative evaluation of clinical studies with particular focus on external validity and model validity. BMC Med Res Methodol 6, 56 (2006).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: