Skip to main content

Table 6 Consensus reached on standards for preferred statistical methods for reliability

From: COSMIN Risk of Bias tool to assess the quality of studies on reliability or measurement error of outcome measurement instruments: a Delphi study

Statistical methods

very good

adequate

doubtful

inadequate

7

For continuous scores: was an Intraclass Correlation Coefficient (ICC)a calculated?

28/35 (80%)(R2b)

ICC calculated; the model or formula was described, and matches the study designc and the data

30/35 (86%)(R2)

ICC calculated but model or formula was not described or does not optimally match the study designc

OR

Pearson or Spearman correlation coefficient calculated WITH evidence provided that no systematic difference between measurements has occurred

Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic difference between measurements has occurred

25/35 (71%) (R2)

OR WITH evidence provided that systematic difference between measurements has occurred

25/34 (74%)(R2)

 

8

For ordinal scores: was a (weighted) Kappa calculated?

26/36 (72%)(R2)

Kappa calculated; the weighting scheme was described, and matches the study design and the data

R3: 27/36 (75%)(R3d)

Kappa calculated, but weighting scheme not described or does not optimally match the study design

19/36 (53%)(R3)

  

9

For dichotomous/nominal scores: was Kappa calculated for each category against the other categories combined?

23/33 (70%)(R3)

Kappa calculated for each category against the other categories combined

   
  1. a Generalizability and Decision coefficients are ICCs; b R2: consensus reached in round 2; c Based on panelists’ suggestions the steering committee decided after round 3 to use the word ‘study design’ instead of ‘reviewer constructed research question’; d R3: consensus reached in round 3