BMC Medical Research Methodology

Table 2 Inter-rater agreements

From: Methodological insights into ChatGPT’s screening performance in systematic reviews

Rater	Versus	Kappa (κ)	95% CI
GP 1	GP 2	0.47	0.39–0.55
	GP 3	0.38	0.31–0.45
	Expert 1	0.53	0.45–0.60
	Expert 2	0.48	0.40–0.55
	ChatGPT	0.28	0.24–0.33
GP 2	GP 3	0.51	0.43–0.59
	Expert 1	0.60	0.52–0.67
	Expert 2	0.57	0.49–0.65
	ChatGPT	0.20	0.16–0.23
GP 3	Expert 1	0.66	0.59–0.72
	Expert 2	0.59	0.52–0.65
	ChatGPT	0.30	0.25–0.34
Expert 1	Expert 2	0.79	0.73–0.84
Expert 1	ChatGPT	0.29	0.25–0.34
Expert 2	ChatGPT	0.28	0.24–0.33

Back to article page

ISSN: 1471-2288

Contact us

General enquiries: journalsubmissions@springernature.com