Inconsistency in the items included in tools used in general health research and physical therapy to evaluate the methodological quality of randomized controlled trials: a descriptive analysis

Armijo-Olivo, Susan; Fuentes, Jorge; Ospina, Maria; Saltaji, Humam; Hartling, Lisa

doi:10.1186/1471-2288-13-116

Table 1 Characteristics of tools identified in the search update

From: Inconsistency in the items included in tools used in general health research and physical therapy to evaluate the methodological quality of randomized controlled trials: a descriptive analysis

Study (authors, year)	Area	Numbers of items	How items were selected for inclusion	Validity	Reliability	Time to complete	Guidelines for use available
NEW TOOLS (2007-2013)
COCHRANE COLLABORATION DEPRESSION, ANXIETY, AND NEUROSIS (CCDAN) [[34, 35]]	Trials of depression, anxiety and neurosis. Psychological and Psychiatric trials	23 items	This tool was developed from items included in other health tools (especially CONSORT statement), and then a consensus from experts was performed to determine a pilot tool to be tested.	Face, content and construct validity	Reliability evaluated through correlation coefficient among 3 raters in total score was high. It ranged from r=0.75-0.86.	15-20 minutes	No
				Scores from raters correlated highly with year of publication (r=0.37-0.6)
					Reliability for individual items was less strong
			Further validation consisted on determine reliability of the tool as well as internal consistency and its correlation with overall score and year of publication.		The mean kappa for all 23 items ranged between 0.51 to 0.54 among 3 raters
					Internal consistency measured through Cronbach alpha ranged between 0.65 to 0.78
THE RANDOMIZED CONTROLLED TRIAL PSYCHOTHERAPY QUALITY RATING SCALE (RCT-PQRS TOOL) [[32, 33]]	Psychotherapy	25 items organized in 6 domains: Description of subject (4 items), definition and delivery of treatment (5 items), outcome measures (5 items), data analysis (5 items), treatment assignment (3 items), overall quality of study (3 items)	Items were generated by an informal expert consensus (members of the American Psychiatric Committee on Research on Psychiatric treatments, outside consultants, who were senior psychotherapy and/or psychopharmacology clinical researchers.	The Cronbach α for all 25 items as rated by the primary rater was 0.87.	The ICC for interrater reliability of item 25, the omnibus rating of the quality of the study, was 0.79.	10-15 minutes	Yes
				The correlation between the 24-item total and the omnibus item (item 25) was 0.88.	The ICC for interrater reliability of the total of the first 24 items was 0.76.
				The correlation between the 24-item total and study year was 0.51, significant at P < .0001.	Nine of the individual items had individual ICCs between 0.5 and 0.8 (items 2, 4, 6, 7, 8, 10, 14, 15, and 19).
				The correlation of the omnibus item and study year was 0.47 (P < .0001).	Twelve items had individual ICCs between 0.3 and 0.5 (items 1, 3, 5, 9, 11, 12, 13, 16, 17, 18, 20, and 24), and 3 items had individual ICCs below 0.3 (items 21, 22, and 23).
					Two items had very low variation between studies (77% of studies received a 0 on item 13 and 97% of studies received a 2 on item 21).
THE RCT-NATURAL PRODUCTS TOOL (RCT-NP) [[31]]	Trials of Natural products	28 items	The initial list of items for this study was compiled from items contained in published critical appraisal instruments designed for RCTs of NPs as well as from items suggested by the research team.	Comparisons with a published instrument to evaluate the methodological quality of RCTs for Natural product was used (criterion validity). Similar results were obtained with both instruments indicating criterion validity (Concurrent validity)	Not reported	Not reported	Yes
			A Delphi process was used to achieve consensus among a group of experts as to which items describing the identity of an NP were essential to consider when critically appraising an RCT of an NP.
				Raters’ answers were compared with investigators answers to determine criterion validity as well. No significant differences between raters and investigators (gold standard) answers were obtained
			The consensus building process was conducted in 2 rounds using email.
			Consensus was considered to have been reached when 80% of participants were in agreement with an item being designated as essential to include in the instrument
			A final list of items considered to be essential by the study participants and investigators was assembled.
			A systematic review regarding tools used in to evaluate quality of NPs trials was performed. Items from all of these tools were compiled
			To be designated as essential to include in the new critical appraisal instrument, an item had to meet at least 1 of the following 2 inclusion criteria: it had to have been contained in a published instrument that was documented as having been validated or must have had empirical evidence to support its inclusion in a published instrument.
A CHECKLIST TO EVALUATE A REPORT OF A NONPHARMACOLOGICAL TRIAL (CLEAR NPT) [[51]]	Health Research	10 items and 5 subitems	Initial pool of items was performed from existing quality tools identified by Moher et al. and Verhagen and the CONSORT statement, users’ guides to the medical literature, and the Cochrane Reviewers’ Handbook.	Content validity was provided by experts in the field through the Delphi method	Not reported	10 minutes	Yes
			Items specific to NPT trials identified in a preliminary study and during informal interviews of clinicians working in the field of NPT were added.
			Thirty-eight potential items were identified.
			A Delphi procedure was used to determine the final items included in the tool.
RISK OF BIAS TOOL (RoB) [[4, 10]]	Health Research	The risk of bias tool is based on six domains and 7 items: sequence generation, allocation concealment, blinding, incomplete outcome data, selective outcome reporting, and “other sources of bias.” Critical assessments on the risk of bias (high, low, unclear) are made separately for each domain.	The choice of components for inclusion in the tool was based on empirical evidence showing their association with effect estimates.	Content validity: items were included based on empirical evidence.	Interrater agreement for the individual domains of the risk of bias tool ranged from slight (κ=0.13 for selective reporting) to substantial (κ=0.74 for sequence generation [13].	~21 minutes	Yes
				Concurrent validity: A high degree of correlation was found between the domains of risk of bias sequence generation compared with Jadad randomisation (k=0.79) and risk of bias allocation concealment compared with Schulz allocation concealment (k=0.73) [13]
					The RoB demonstrated moderate to substantial (mean values 0.56 to 0.76) agreement on three of twelve items [59].
					The interrater agreement was fair (0.40) for selective outcome reporting and almost perfect (0.86) for sequence generation [62].
				Correlation was low for the comparisons between the domains of risk of bias incomplete outcome data and the Jadad withdrawal item, risk of bias overall risk and total Jadad score, and risk of bias overall risk and Schulz allocation concealment [13]
					Interrater agreement for the majority of domains and overall risk of bias was moderate (k = 0.41–0.60) [60].
				The correlations between overall risk of bias assessments and total Jadad score (t= 0.04) and allocation concealment (t = 0.02) were low [60].	The inter-rater reliability across individual domains of the CCRBT was found to be 0.30, which is considered slight agreement between raters [46]. The inter-rater reliability of the final grade assigned to each paper by this tool was ICC = 0.58 (95% CI 0.20–0.81)[61]
					There was very poor agreement between the Effective Public Health Practice Project Quality Assessment Tool (EPHPP) and the RoB tool in the final grade assigned to each study (kappa = 0.006)[61]
					The inter-rater reliability was substantial for sequence generation (k=0.79) and fair for the other 5 items (k=0.24-0.37). Interrater reliability between consensus evaluations across rater pairs was fair for allocation concealment and “other sources of bias” (k=0.37-0.27), and moderate for sequence generation (k=0.60). [62]

95% CI = 95% confidence interval, CONSORT Consolidated Standards of Reporting of Trials, ICC intraclass correlation coefficient, k kappa, NP natural products, NPT natural products trials, RCT randomized controlled trial, RoB risk of bias.

Back to article page

ISSN: 1471-2288

Contact us

General enquiries: journalsubmissions@springernature.com

BMC Medical Research Methodology

Contact us