# Table 3 Intra-rater agreement between two assessments over time of individual raters and of all raters pooled

Content quality
Complete congruence Discrepancy 1 grade Discrepancy 2 grades weighted kappa ± standard error
Rater 1 11/23 11/23 1/23 0.020 ± 0.172
(48%) (48%) (4%) p = 0.909
Rater 2 19/25 6/25 0/25 0.559 ± 0.130
(76%) (24%) (0%) p < 0.001
Rater 3 17/25 8/25 0/25 0.460 ± 0.123
(68%) (32%) (0%) p < 0.001
Rater 4 13/22 8/22 1/22 0.236 ± 0.152
(59%) (36%) (5%) p = 0.120
Mean all raters 21/25 4/25 0/25 0.669 ± 0.149
(84%) (16%) (0%) p < 0.001
Formal quality
Complete congruence Discrepancy 1 grade Discrepancy 2 grades weighted kappa ± standard error
Rater 1 10/25 12/25 3/25 0.145 ± 0.150
(40%) (48%) (12%) p = 0.336
Rater 2 20/25 4/25 1/25 0.650 ± 0.133
(80%) (16%) (4%) p < 0.001
Rater 3 13/25 11/25 1/25 0.147 ± 0.161
(52%) (44%) (4%) p = 0.360
Rater 4 11/22 8/22 3/22 0.000 ± 0.000
(50%) (36%) (14%) p = 1
Rater 5 19/24 4/24 1/24 0.410 ± 0.249
(79%) (17%) (4%) p = 0.100
Mean all raters 13/25 11/25 1/25 0.169 ± 0.150
(52%) (44%) (4%) p = 0.260
1. 25 expert answers were scored at two different points in time by the same raters. The numbers represent the number of expert answers (percentage in brackets) that were scored with the same grade twice (complete congruence), that were scored the second time one grade lower or higher than the first time (discrepancy 1 grade) and that were scored two grades lower or higher the second time (discrepancy 2 grades). If the total number of expert answers is lower than 25, the respective rater regarded some of the expert answers as "unscorable". Rater 5 as a representative of the German CF-patient organization and not a care team member scored only the formal aspect of answers. The kappa values for agreement were interpreted according to the scale by Landis and Koch [13]: agreement poor < 0; slight 0.00-0.20; fair 0.21-0.40; moderate 0.41-0.60; substantial 0.61-0.80; almost perfect 0.81-1.00). A p value of p < 0.05 was regarded to be significant.