Likelihood based methodology
Suppose that we are interested in comparing the reproducibility of two instruments. Let x
ijl
be the jth measurement of the ith subject by the lth instrument, j = 1,2,... m
l
, i = 1,2,... n, and l = 1, 2. To evaluate the WSCV we consider the one-way random effects model
x
ijl
= μ
l
+ b
i
+ e
ijl
where μ
l
is the mean value of measurements made by the lth instrument, b
i
are independent random subject effects with b
i
~ N(0, ), and e
ijl
are independent N(0, ). Many authors have used the intra-class correlation coefficient (ICC), ρ
l
defined by the ratio as measure of reproducibility/reliability [18, 23]. Quan and Shih [8] argued that ρ
l
is study-population based since it involves between-subject variation. Meaning that the more heterogeneity in the population, the larger the ρ
l
. Alternatively, they proposed the within-subject coefficient of variation (WSCV) θ
l
= σ
l
/μ
l
as a measure of reproducibility. It determines the degree of closeness of repeated measurements taken on the same subject either by the same instruments or on different occasions under the same conditions. It is clear that, the smaller the WSCV, the better the reproducibility. We distinguish the WSCV from the coefficient of variation since CV
l
involves in the numerator and similar to ρ
l
is population based. Therefore, more heterogeneity in the population would result in a large value of CV
l
. For that reason we shall focus our work on the WSCV rather than the CV. We also note that there is an inverse relationship between the ICC (ρ
l
) and the corresponding within subject variance . Clearly, larger values of ICC (higher reliability) would be associated with smaller WSCV (better reproducibility). The focus of this paper is on aspects of statistical inference on the difference between two correlated WSCV. The inferential procedure depends on the multivariate normality of the measurements and is mainly likelihood based. The following set-up is to facilitate the construction of the likelihood function.
Let
denote the measurements on the i
thsubject, i = 1,2,....,n where are the m
1 measurements obtained by the first method (platform), are the m
2 measurements obtained the second method (platform). We assume that X
i
~ N(μ, Σ), where and,
(2)
In these expressions 1
k
is a column vector with all k elements equal to 1, I
k
is a k × k identity matrix and J
k
and J
kxt
are k × k and k × t matrices with all the elements equal to 1. Thus the model assumes that the m
1 observations taken by the first platform have common mean μ
1, common variance , and common intra-class correlation ρ
1, whereas the m
2 measurements taken by the second platform have common mean μ
2, common variance , and common intra-class correlation ρ
2. Moreover, ρ
12 denotes the interclass correlation between any pair of measurements x
ij
(j = 1,2,... m
1) and , and also assumed constant across all subjects in the population.
For the l
thmethod, the WSCV, which will be denoted as θ
l
in the remainder of the paper is defined as
θ
l
= σ
l
/μ
l
, l = 1, 2.
Our primary aim is to develop and evaluate methods of testing H
0:θ
1 = θ
2 taking into account dependencies induced by a positive value of ρ
12. We restrict our evaluation to reproducibility studies having m
1 = m
2 = m.
Methods for testing the null hypothesis
Wald test (WT)
If X
1, X
2,.... X
n
is a sample from the above multivariate normal distribution, then the log-likelihood function l, as a function of ψ = (μ
1, μ
2, , , ρ
1, ρ
2, ρ
12) is given by:
(3)
where,
w = u
1
u
2 - m
2 ,
u
l
= 1 + (m - 1)ρ
l
, l = 1, 2 and,
From [24] the conditions {1 + (m - 1)ρ
1}{1 + (m - 1)ρ
2} > m
2 and -1/(m - 1) <ρ
l
< 1 must be satisfied for the likelihood function to be a sample from a non-singular multivariate normal distribution.
The summary statistics given in (3) are defined as:
The maximum likelihood estimates (MLE) for μ
l
and are given respectively by , where and l = 1, 2. Clearly, exists for values of m > 1. Therefore we shall assume that m > 1 throughout this paper. From [24], we obtain and by computing Pearson's product-moment correlation over all possible pairs of measurements that can be constructed within platforms 1 and 2 respectively, with similarly obtained by computing this correlation over the nm
2 pairs (x
ij
, x
i
,
m+l
).
The WT of H
0:θ
1 = θ
2 requires the evaluation of variance of , l = 1, 2, and . To obtain these values we use elements of Fisher's information matrix, along with the delta method [26, 27]. On writing:
ψ = (ψ
1, ψ
2)',ψ
1 = (μ
1, μ
2)', and , the Fisher's information matrix I = -E⌊∂2
l/∂ψ∂ψ'⌋ has the following structure:
(4)
This is based on a result from [26] (page 239) indicating that, I
12 = = -E(∂2
l/∂ψ
1∂ψ'2) = 0. Therefore, from the asymptotic theory of maximum likelihood estimation we have:
And the elements of I
22 are given in the Appendix.
The elements of are the asymptotic variance- covariance matrix of the maximum likelihood estimators of the covariance parameters. Inverting Fisher's information matrices we get:
(5)
Applying the delta method [27], we can show, to the first order of approximation that:
(6)
The maximum likelihood estimator of θ
l
is . Again, by application of the delta method, we can show to the first order of approximation that:
(7)
as was shown by Quan and Shih [8].
Again using the delta method we show approximately that:
(8)
From [28] we apply the large sample theory of maximum likelihood to establish that:
(9)
is approximately distributed under H
0 as a standard normal deviate. The denominator of Z is the standard error of and is denoted by SE . Since the standard error of contains unknown parameters, its maximum likelihood estimate is obtained by substituting for θ
l
, for ρ
l
and for ρ
12. Moreover, we may construct an approximate (1-α)100% confidence interval on (θ
1 - θ
2) given as:
, where z
α/2 is the (1-α/2)100% cut-off point of the standard normal distribution.
Likelihood ratio test (LRT)
An LRT of H
0 : θ
1 = θ
2 was developed numerically, and computed by first setting μ
l
= σ
l
/θ
l
, l = 1,2 in Equation (3), and then adopting the following algorithm:
1- Set μ
l
= σ
l
/θ
l
, l = 1,2 in Equation (3), thereafter;
2- Set θ
1 = θ
2 = θ in (3)
3- Minimize the resulting expression with respect to all six parameters (σ
1, σ
2, ρ
1, ρ
2, ρ
12, θ) and;
4- Subtract the minimum from the minimum of -2L as computed over all seven parameters (σ
1, σ
2, ρ
1, ρ
2, ρ
12, θ
1, θ
2) in the model.
It then follows from standard likelihood theory that the resulting test statistic is approximately chi-square distributed with 1 degree of freedom under H0.
Score test
One of the advantages of likelihood based inference procedure is that in addition to the WT and the LRT "Rao's score test" can also be readily developed. The motivation for it is that it can sometimes be easier to maximize the likelihood function under the null hypothesis than under the alternative hypothesis. A standard procedure for performing the score test of H
0 : θ
1 = θ
2 is to set θ
2 = θ
1 + Δ, so that the null hypothesis is equivalent to H
0 : Δ = 0, where Δ is unrestricted. Replacing μ
l
by σ
l
/θ
l
, the log-likelihood function L is then independent of μ
l
.
Let L = L(Δ; ψ
•) = L(Δ; θ
1, σ
1, σ
2, ρ
1, ρ
2, ρ
12) and .
From [28] the score statistic is given by:
where
(10)
and . The matrices on the right hand side of A
1•2 are obtained from partitioning the Fisher's information matrix A so that where , and with all the matrices on the right hand side of A
1•2 evaluated at Δ = 0. When an estimator other than the MLE is used for the nuisance parameters ψ*, provided that the estimator is consistent, it was shown that the asymptotic distribution of S is that of a chi-square with 1 degree of freedom [29, 30].
The score test has been applied in many situations and has been proven to be locally powerful. Unfortunately, the inversion of A
1•2 is quite complicated and we cannot obtain a simple expression for S that can be easily used. Moreover, we have also found through extensive simulations that while the score test holds its levels of significance, it is less powerful than LRT and WT across all parameter configurations. We therefore focus our subsequent discussion of power to LRT and WT.
Regression test
Pitman [1] and Morgan [2] introduced a technique to test the equality of variances of two correlated normally distributed random variables. It is constructed to simply test for zero correlation between the sums and differences of the paired data. Bradley and Blackwood [31] extended Pitman and Morgan's idea to a regression context that affords a simultaneous test for both the means and the variances. The test is applicable to many paired data settings, for example, in evaluating the reproducibility of lab test results obtained from two different sources. The test could also be used in repeated measures experiments, such as in comparing the structural effects of two drugs applied to the same set of subjects. Here we generalize the results of Bradley and Blackwood to establish the simultaneous equality of means and variances of two correlated variables, implying the equality of their coefficients of variations.
Let , and define , and .
Direct application of the multivariate normal theory shows that the conditional expectation of d
i
on s
i
is linear [32]. That is
E(d
i
| s
i
) = α + βs
i
,
where
(11.a)
(11.b)
where
is strictly positive.
The proof is straightforward and is therefore omitted.
It can be shown then from direct application of the multivariate normal theory that the conditional expectation (11) is linear, and does not depend on the parameter ρ
12.
From (11.a) and (11.b), it is clear that α = β = 0 if and only if μ
1 = μ
2 and σ
1 = σ
2 simultaneously. Therefore, testing the equality of two correlated coefficients of variations is equivalent to testing the significance of the regression equation (11). From the theory of least squares, if we define:
and EMS = (TSS - RSS)/(n - 2),
the hypothesis H
0 : α = β = 0 is rejected when RSS/EMS exceeds F
v,1,(n-2), the (1 - v) 100% percentile value of the F-distribution with 1 and (n-2) degrees of freedom [32].