On Jones et al.’s method for assessing limits of agreement with 1 the mean for multiple observers

11 Background To assess the agreement of continuous measurements between a number of observers, 12 Jones et al. introduced limits of agreement with the mean (LOAM) for multiple observers, representing 13 how much an individual observer can deviate from the mean measurement of all observers. Besides the 14 graphical visualisation of LOAM, suggested by Jones et al., it is desirable to supply LOAM with confidence 15 intervals and to extend the method to the case of multiple measurements per observer. 16 Methods We reformulate LOAM under the assumption the measurements follow an additive two-way 17 random effects model. Assuming this model, we provide estimates and confidence intervals for the 18 proposed LOAM. Further, this approach is easily extended to the case of multiple measurements per 19 observer. 20 Results The proposed method is applied on two data sets to illustrate its use. Specifically, we consider 21 agreement between measurements regarding tumour size and aortic diameter. For the latter study, 22 three measurement methods are considered. 23 Conclusions The proposed LOAM and the associated confidence intervals are useful for assessing 24 agreement between continuous measurements.


Background
27 Clinical decisions regarding diagnosis or treatment are often based on one or more measured quantities 28 such as blood pressure, tumour size, or the diameter of an aorta. To understand the limitations of using 29 such measurements in clinical practice, it is important to quantify how much the measurements may vary. 30 For almost three decades, Bland-Altman plots have been the standard method for graphical 31 assessment of agreement between continuous measurements made by two observers or methods on a 32 number of subjects [1]. In particular, Bland-Altman plots are often used to assess how well a new 33 measurement method compares to a current golden standard method. However, if the goal is to assess the 34 variability of measurements made by different observers it is preferable to consider more than two In more detail, consider a study where a continuous quantity is observed on subjects by observers 42 (or methods). We let denote an observation from a random variable , which models the 43 measurement performed on the i'th subject by the j'th observer for = 1, … , and = 1, … , . Assuming 44 no preferred observer, Jones et al. suggested to assess the agreement between measurements made by 45 different observers by investigating how much the measurements vary around the subject-specific 46 average [2]. More formally, they were interested in how much the differences = − ̅ ⋅ are likely to 47 vary, where ̅ ⋅ denotes the average measurement for subject across the observers. For visualising the 48 data, Jones et al. propose to consider a plot of the observed differences = − ̅ ⋅ against the observed 49 subject-specific average ̅ ⋅ . We will refer to this as an agreement plot. For an example of an agreement 50 plot see Figure 1 below. In the special case of two observers, i.e., b = 2, the agreement plot corresponds to 51 the scatter plot of ( ̅ ⋅ , 0.5( 1 − 2 )) for = 1, … , , which again corresponds to a scaled Bland-Altman 52 plot. An agreement plot can for example help to detect whether the spread of the differences is associated 53 not provide a way to: 1) assess the variation of the LOAM estimate, 2) integrate systematic differences 66 between the observers, and 3) extend the method to multiple observations per observer. 67 In this paper, we suggest formalising Jones et al.'s approach under a simple two-way random effects 68 model, which allows us to formulate a coherent statistical inference procedure for the LOAM. In addition, 69 we provide not only an implementation in the statistical programming software R, but also simple 70 formulae which can be implemented in, e.g., statistical programming languages, Excel, or automatic web-71 modules for data collection. As an alternative, we propose to derive LOAM assuming a statistical model for the measurements. 76 Assuming a model provides a theoretical framework in which the LOAM can be constructed in a 77 transparent way and furthermore enables us to supply estimates and confidence intervals for the LOAM. 78 where µ describes the overall mean, and Ai, Bj, and Eij are independent random variables following zero-81 mean normal distributions with variances 2 , 2 , and 2 , respectively. Under this model, measurements 82 made by different observers on different subjects are uncorrelated, while measurements made by 83 different observers, but on the same subject, have covariance 2 , and measurements made by the same 84 observer for different subjects are assumed to have covariance 2 . Thus, the model accounts for 85 correlation among measurements made by the same observer or on the same subject. Note that the 86 measurements are assumed to be homoscedastic with variance 2 + 2 + 2 . That is, the variance is split 87 into three components: the inter-subject, inter-observer, and residual variance. We follow here the 88 tradition of some authors and call interchangeably the residual variance for intra-observer variance. 89

90
Under the two-way random effects model stated in Eq. (1), the difference to the mean, , is normally 91 distributed with mean zero and variance ( 2 + 2 )( − 1)/ . Thus, under this model we expect 95% of 92 the differences to be within the limits 93 ±1.96 √ − 1 ( 2 + 2 ). (2) We therefore propose the above as the 95 % LOAM. 94 To estimate 2 and 2 under the suggested two-way random effects model, we use the unbiased and 95 consistent ANOVA estimates (see, e.g., Chapter 4 of Searle et al. [3]), given by 96 2 denoting the sums of squares for the observer and residual term, and = − 1 and 98 = ( − 1)( − 1). Using these estimates of 2 and 2 , we obtain the following estimate of the 95% 99 for the upper 95% LOAM. Flipping the sign of the endpoints provides the corresponding confidence 110 interval for the lower 95% LOAM. Simulations under the two-way random effects model from Eq. (1)  111 indicate that the coverage probability for the symmetric confidence interval can be quite far away from 112 95% even with a reasonable high number of measurements. In particular, the confidence interval tends to 113 be too narrow to obtain the desired coverage probability when the number of observers is small or 114 moderate. In that case, we recommend using the following asymmetric confidence interval instead. 115 First, an approximate 95% confidence interval can be obtained for 2 + 2 using Eq. (2.2) in Graybill 116 and Wang [4]. Next, transforming this in accordance with Eq. (2) we get an approximate confidence 117 interval for the upper 95% LOAM given by 118 Results from a small simulation study, [see Additional file 1], indicate that the "sufficient" number of 124 observers depends on the inter-observer and residual variation. However, it seems that 30-40 observers 125 in general is enough to obtain an actual coverage probability of around 90%-95%. 126

Sample size calculations
127 Assume we have a fixed number of subjects we want to include in a future study to assess agreement 128 between measurements. Then we may want to determine the necessary number of observers to obtain a 129 certain half-width of the confidence interval in Eq. (5), such that the confidence interval is the estimated 130 LOAM ± . This requires initial estimates of 2 and 2 , say ̂, 0 2 and ̂, 0 2 , which can be obtained from, e.g., 131 a pilot study. Then can be estimated by 132 Caution should be taken here, as the symmetric confidence interval tends to be artificially narrow as 133 mentioned in Section 2.1.3. It would be preferable to estimate using the asymmetric confidence interval 134 in Eq. (6) instead, as this in general has a coverage probability closer to the desired 95%. However, as the 135 dependency on is more complicated we cannot obtain an estimate of on closed form and numerical 136 approximation is needed. To keep it simple, may be estimated using the above formula, whereupon the 137 width of the resulting asymmetric confidence interval is investigated for that specific choice of . 138

139
In order to assess the extent of the inter-observer and intra-observer variations, we suggest to consider a 140

154
To investigate agreement between observers, we propose first to make the agreement plot with the 155 estimate and confidence interval for the 95% LOAM from Sections 2.1.2 -2.1.3, and to calculate the sample 156 means and standard deviations for the measurements grouped by observer or subject. Inspection of the 157 agreement plot and the sample means grouped by observer can be used to reveal whether any observers 158 tend to make unusual large or small measurements. Further, the agreement plot and the grouped standard 159 deviations can be used to check whether the assumption of homoscedasticity of the random model is 160 fulfilled. If the model seems reasonable, we report the estimate and confidence interval for the LOAM 161 along with the associated confidence intervals. Next, we may compare the order of magnitude of ̂2 with 162 ̂2 to investigate how much of the variation is due to different observers. Further, we calculate the 163 confidence interval of and . If clinicians deem the observer variation to be negligible, the observer 164 effect could in principle be removed from the random model, entailing that the LOAM and the associated 165 estimate and confidence intervals should be adjusted accordingly [see Additional file 2]. 166 The agreement analysis may be supplemented with an estimate and confidence interval for the ICC. The estimated 95% LOAM are ±1.1 centimetres. Note that the asymmetric confidence interval is much 206 wider than the symmetric. As mentioned in Section 2.1.3, the symmetric confidence interval may be 207 artificially narrow. The inter-observer standard deviation estimate is 0.29 cm with a confidence interval 208 from 0.07 cm to 0.50 cm. In comparison, the intra-observer standard deviation estimate is 0.58. Jones   Using the methods described in Section 2.2 for multiple measurements, we calculate estimates 233 and confidence intervals for the LOAM, , and (see Table 3) and make an agreement plot (see Figure  234 2). The observer variation constitutes a large part of the total variation and should not be excluded. The 235 LTL method have the largest estimated LOAM, meaning that measurements made by this method tend to 236 vary more. However, the wide confidence intervals for the LOAM indicate that more observers may be 237 needed to assess this properly. We found significantly less intra-observer variation for the LTL and ITI 238 compared to the OTO method. This finding is in line with the conclusion by Borgbjerg   In the study we have chosen to formulate a simple two-way random effects model, with additive 258 observer and subject effects. However, in several cases the observers can react differently upon varying 259 subjects. For single measurements this interaction effect is confounded with the residual error, but for 260 multiple measurements this effect could in principle be modelled and estimated. However, we have in 261 this work chosen not to walk down this alley in order to keep the paper focussed on a simple, yet useful 262 extension of Bland-Altman's graphical method.