From: Methodological insights into ChatGPT’s screening performance in systematic reviews
Rater | Versus | Kappa (κ) | 95% CI |
---|---|---|---|
GP 1 | GP 2 | 0.47 | 0.39–0.55 |
GP 3 | 0.38 | 0.31–0.45 | |
Expert 1 | 0.53 | 0.45–0.60 | |
Expert 2 | 0.48 | 0.40–0.55 | |
ChatGPT | 0.28 | 0.24–0.33 | |
GP 2 | GP 3 | 0.51 | 0.43–0.59 |
Expert 1 | 0.60 | 0.52–0.67 | |
Expert 2 | 0.57 | 0.49–0.65 | |
ChatGPT | 0.20 | 0.16–0.23 | |
GP 3 | Expert 1 | 0.66 | 0.59–0.72 |
Expert 2 | 0.59 | 0.52–0.65 | |
ChatGPT | 0.30 | 0.25–0.34 | |
Expert 1 | Expert 2 | 0.79 | 0.73–0.84 |
ChatGPT | 0.29 | 0.25–0.34 | |
Expert 2 | ChatGPT | 0.28 | 0.24–0.33 |