This article has Open Peer Review reports available.
Power and sample size determination for the group comparison of patient-reported outcomes using the Rasch model: impact of a misspecification of the parameters
- Myriam Blanchin^{1}Email author,
- Alice Guilleux^{1},
- Bastien Perrot^{1},
- Angélique Bonnaud-Antignac^{1},
- Jean-Benoit Hardouin^{1} and
- Véronique Sébille^{1}
https://doi.org/10.1186/s12874-015-0011-4
© Blanchin et al.; licensee BioMed Central. 2015
Received: 10 September 2014
Accepted: 20 February 2015
Published: 15 March 2015
Abstract
Background
Patient-reported outcomes (PRO) are important as endpoints in clinical trials and epidemiological studies. Guidelines for the development of PRO instruments and analysis of PRO data have emphasized the need to report methods used for sample size planning. The Raschpower procedure has been proposed for sample size and power determination for the comparison of PROs in cross-sectional studies comparing two groups of patients when an item reponse model, the Rasch model, is intended to be used for analysis. The power determination of the test of the group effect using Raschpower requires several parameters to be fixed at the planning stage including the item parameters and the variance of the latent variable. Wrong choices regarding these parameters can impact the expected power and the planned sample size to a greater or lesser extent depending on the magnitude of the erroneous assumptions.
Methods
The impact of a misspecification of the variance of the latent variable or of the item parameters on the determination of the power using the Raschpower procedure was investigated through the comparison of the estimations of the power in different situations.
Results
The power of the test of the group effect estimated with Raschpower remains stable or shows a very little decrease whatever the values of the item parameters. For most of the cases, the estimated power decreases when the variance of the latent trait increases. As a consequence, an underestimation of this variance will lead to an overestimation of the power of the group effect.
Conclusion
A misspecification of the item difficulties regarding their overall pattern or their dispersion seems to have no or very little impact on the power of the test of the group effect. In contrast, a misspecification of the variance of the latent variable can have a strong impact as an underestimation of the variance will lead in some cases to an overestimation of the power at the design stage and may result in an underpowered study.
Keywords
Background
Patient-reported outcomes (PRO) comprise a range of outcomes collected directly from the patient regarding the patient’s health, the disease and its treatment as well as their impact and include health related quality of life, satisfaction with care, psychological well-being... There has been growing interest in theses outcomes in the past years as they can be helpful to evaluate the effects of treatment on patient’s life or to study the quality of life of patient along with the disease progression to adapt the patient’s care [1-3]. The concept measured by PRO cannot be directly observed. In practice, patient-reported outcomes are assessed through questionnaires composed of items that indirectly measure a latent variable which represents the concept of interest. Two theories exist for the analysis of the responses of patients to items. The models from the Classical Test Theory are based on a score that often sums the responses to the items. Another theory has gained importance in patient-reported outcomes area [4], the Item Response Theory (IRT) including models which link the probability of a given answer to an item with item parameters and the latent variable. IRT has shown advantages such as the management of missing data [5], the possibility to obtain an interval measure for the latent trait, the comparison of latent traits levels independently of the instrument, the management of possible floor and ceiling effects [6,7].
Guidelines for the development of PRO instruments and analysis of PRO data have been developed [8-10] and have emphasized the need to report methods used for sample size planning. Indeed, sample size determination is essential at the design stage to achieve the desired power for detecting a clinically meaningful difference in the future analysis. An inadequate sample size may lead to misleading results and incorrect conclusions. Whereas an underestimated sample size may produce an underpowered study, an overestimated sample size raises ethical issues. A too large sample size will result in more included patients as would have been required, a longer follow-up period and a delayed analysis stage. All these problems may slow down the conclusion of the study and, for example, may delay an improvement of the medical care or the availability of a more efficient treatment towards patients.
The widely-used sample size formula for the comparison of two normally distributed endpoints in two independent groups of patients is based on a t-test. It has been recently highlighted that this formula was inadequate in the IRT setting [11]. In randomized clinical trials, Holman et al. [12] have first studied the power of the test of group effect for the two-parameter logistic model from the IRT. This simulation study investigated the power for various values of sample size, number of items and effect size in the context of a comparison of two groups answering a questionnaire composed of dichotomous items. This study was further extended [13] to compare different estimation methods of the power for the comparison of two groups in the context of dichotomously or polytomously scored items, and cross-sectional or longitudinal studies. These two simulation studies were based on the two-parameter logistic model from the IRT and its version for polytomous items, the generalized partial credit-model. In the framework of the Rasch model [14,15], Hardouin et al. [16] have proposed a methodology to determine the power of the Wald test of group effect for PRO cross-sectional studies comparing two groups of patients named the Raschpower procedure. In order to validate this theoretical approach, the power computed using Raschpower was compared to the power obtained in several simulation studies corresponding to different cases (cross-sectional [16,17] and longitudinal studies [18], well or misspecified Rasch models [19]). As the Raschpower procedure strongly relies on the mixed Rasch model that assumes the normality of the distribution of the latent variable, the robustness of this procedure to violation of the underlying model assumptions was also assessed [20]. The power obtained with the Raschpower method assuming normal distribution were compared to reference power obtained from data simulated with a non-normal distribution (a beta distribution leading to U, L or J-shaped distributions). Simulation studies have shown that the powers of group effect obtained either from the Raschpower procedure or from the simulated datasets were close to each other. In conclusion, the Raschpower procedure seems robust to non-normality of the latent variable.
The power determination using Raschpower in cross-sectional studies depends on the expected values of the following parameters: the sample size in each group, the number of items, the group effect defined as the expected difference between the means of the latent trait of each group, the item parameters and the variance of the latent trait. These expected values are required at the design stage and it can turn out to be problematic if no previous studies can provide some information on their values. If the expected values at the design stage are far from the estimated values in the study at the analysis stage, the power for a determined sample size could then not be achieved. As the variance of the latent trait and the item parameters are difficult to set at the planning stage of a study, it is highly probable that their expected values will be different from the observed values at the analysis stage. Therefore, the power of the study might be different from the expected power to a greater or lesser extent depending on the magnitude of the erroneous assumptions regarding the value of all the parameters of the study. The objective of this work is to study the impact of a misspecification of the variance of the latent variable or of the item parameters on the determination of the power using the Raschpower procedure.
Methods
Sample size and power determinations using the Rasch model
The latent regression Rasch model
where x_{ ij } is a realization of the random variable X_{ ij }. θ is a realization of the random variable Θ, generally assumed to have a gaussian distribution. The parameters of the Rasch model can then be estimated by marginal maximum likelihood (MML) [21]. A constraint has to be adopted to ensure the identifiability of the model: the mean of the latent variable is often constrained to 0 (μ=0).
The Raschpower procedure for power estimation
The Raschpower procedure provides an estimation of the power for the comparison of PRO data in two independent groups of patients when a Rasch family model is intended to be used for the analysis. This procedure is used at the planning stage and is based on a Wald test to detect a group effect. To perform the test of group effect, an estimate Γ of the group effect γ and its standard error are required. Since no dataset exists during the planning stage, no estimate can be obtained from data. Hardouin et al. [16] proposed to obtain a numerical estimation for the standard error of Γ from an expected dataset of the patients’ responses.
In this procedure, a dataset of the patients’ responses is first created conditionally on the planning expected values for the sample size in each group (N_{0} and N_{1}), the group effect (γ), the item difficulties (δ_{ j }) and the variance of the latent trait \(\left (\sigma _{\theta }^{2}\right)\). All possible response patterns of the patients are determined. The associated probability and the expected frequency of each response pattern for each group are computed using the mixed Rasch model (eq. 1) given the planning expected values. The expected dataset is composed of all possible response patterns and their associated frequencies.
with γ assumed to take on positive values, z_{1−α/2} be the quantile of the standard normal distribution and \(\hat {var}\left (\hat \gamma \right)\) estimated from the expected dataset. In order to validate the Raschpower procedure, the power computed using Raschpower was compared previously to the power obtained in several simulation studies. The following parameters could vary in the simulation studies: the sample size (in each group or at each time, N=50, 100, 200, 300, 500), the group or time effect (γ=0.2,0.5,0.8), the variance of the latent traits \(({\sigma _{1}^{2}}={\sigma _{2}^{2}}=0.25, 1, 4, 9\)), the correlation of the latent traits between two times of measurement for longitudinal studies (ρ=0.4, 0.7, 0.9), the number of items (J=5 or 10) and of response categories (K=2, 3, 5, 7). In this study, a large set of values of the variance of the latent variable and item parameters are examined to evaluate the impact of a misspecification of these parameters.
Misspecification of the variance of the latent variable
To determine the impact of a misspecification of the variance of the latent variable, we have compared different estimations of the power estimated with Raschpower for a large set of values of \(\sigma _{\theta }^{2}=\{0.25,0.5,0.75\), 1,1.5,2,2.5,3,4,5,6,7,8,9}. By comparing the estimations of the power, the impact of an under/overestimation of the variance at the planning stage can be assessed. All the parameters used at the planning stage could vary: the sample size in each group (N_{0}=N_{1}=50,100,200,300,500), the number of items (J=3,5,7,9,11,13,15), the value of the group effect (γ=0.1,0.2,0.5,0.8). The item difficulties were drawn from the percentiles of a normal distribution with the same characteristics as the latent variable distribution \(N\left (0,\sigma _{\theta }^{2}\right)\).
Misspecification of the item difficulties
To determine the impact of a misspecification of the item difficulties, we have compared the power estimated with Raschpower for a large set of values of δ_{ j }. The item difficulties were drawn from the percentiles of the item distribution defined as an equiprobable mixture of two normal distributions N(−a;0.1x^{2}) and N(a;x^{2}) where a is the gap between the means of the two normal distributions. As a consequence, the mean of the item distribution is equal to 0 and \(x^{2}=\left (\sigma ^{2}_{\delta _{j}}-a^{2}\right)/0.55\) can be expressed as a function of a and \(\sigma ^{2}_{\delta _{j}}\), the variance of the item distribution. The equiprobable mixture for generating item distribution easily creates two types of distribution: unimodal and bimodal. A unimodal distribution of the item difficulties reflects the situation where the questionnaire is perfectly suitable for a population with normally distributed latent traits, which is the case here, contrary to a bimodal distribution. The equiprobable mixture also creates a large number of item distributions in which item difficulties can be more or less regularly spaced which may impact the results of Raschpower.
The draw of the parameters and the estimation of power using the Raschpower procedure for all combinations of parameters were performed with Stata software.
Results
Misspecification of the variance of the latent variable
Power estimated with the Raschpower procedure for different values of the variance of the latent variable ( \(\sigma ^{2}_{\theta }\) ), the number of items ( J ), the group effect ( γ ) and the sample size per group ( N _{ g } )
J | N _{g} | γ | \({\sigma ^{2}_{\theta }=0.25}\) | \({\sigma ^{2}_{\theta }=0.5}\) | \({\sigma ^{2}_{\theta }=0.75}\) | \({\sigma ^{2}_{\theta }=1}\) | \({\sigma ^{2}_{\theta }=2}\) | \({\sigma ^{2}_{\theta }=4}\) | \({\sigma ^{2}_{\theta }=9}\) |
---|---|---|---|---|---|---|---|---|---|
3 | 50 | 0.1 | 0.058 | 0.054 | 0.051 | 0.049 | 0.044 | 0.039 | 0.034 |
0.2 | 0.117 | 0.104 | 0.095 | 0.088 | 0.072 | 0.058 | 0.046 | ||
0.5 | 0.482 | 0.417 | 0.367 | 0.328 | 0.237 | 0.162 | 0.104 | ||
0.8 | 0.859 | 0.793 | 0.731 | 0.677 | 0.511 | 0.343 | 0.199 | ||
3 | 200 | 0.1 | 0.117 | 0.104 | 0.095 | 0.088 | 0.072 | 0.058 | 0.046 |
0.2 | 0.337 | 0.289 | 0.254 | 0.229 | 0.168 | 0.119 | 0.081 | ||
0.5 | 0.969 | 0.938 | 0.900 | 0.859 | 0.702 | 0.495 | 0.287 | ||
0.8 | 1.000 | 1.000 | 0.999 | 0.998 | 0.978 | 0.875 | 0.607 | ||
3 | 500 | 0.1 | 0.229 | 0.198 | 0.176 | 0.159 | 0.121 | 0.090 | 0.064 |
0.2 | 0.682 | 0.602 | 0.538 | 0.485 | 0.351 | 0.234 | 0.141 | ||
0.5 | 1.000 | 1.000 | 0.999 | 0.998 | 0.976 | 0.868 | 0.598 | ||
0.8 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.998 | 0.942 | ||
9 | 50 | 0.1 | 0.084 | 0.071 | 0.064 | 0.059 | 0.049 | 0.041 | 0.036 |
0.2 | 0.209 | 0.164 | 0.138 | 0.121 | 0.088 | 0.065 | 0.049 | ||
0.5 | 0.798 | 0.682 | 0.579 | 0.501 | 0.325 | 0.200 | 0.118 | ||
0.8 | 0.991 | 0.970 | 0.929 | 0.877 | 0.674 | 0.433 | 0.234 | ||
9 | 200 | 0.1 | 0.213 | 0.165 | 0.138 | 0.121 | 0.088 | 0.065 | 0.049 |
0.2 | 0.643 | 0.505 | 0.415 | 0.352 | 0.227 | 0.144 | 0.090 | ||
0.5 | 1.000 | 0.998 | 0.992 | 0.977 | 0.856 | 0.612 | 0.340 | ||
0.8 | 1.000 | 1.000 | 1.000 | 1.000 | 0.998 | 0.948 | 0.696 | ||
9 | 500 | 0.1 | 0.453 | 0.345 | 0.281 | 0.239 | 0.158 | 0.106 | 0.071 |
0.2 | 0.958 | 0.878 | 0.788 | 0.707 | 0.482 | 0.295 | 0.163 | ||
0.5 | 1.000 | 1.000 | 1.000 | 1.000 | 0.998 | 0.944 | 0.686 | ||
0.8 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.974 | ||
15 | 50 | 0.1 | 0.094 | 0.077 | 0.067 | 0.061 | 0.050 | 0.042 | 0.036 |
0.2 | 0.228 | 0.181 | 0.149 | 0.129 | 0.091 | 0.067 | 0.050 | ||
0.5 | 0.768 | 0.695 | 0.607 | 0.532 | 0.346 | 0.210 | 0.122 | ||
0.8 | 0.989 | 0.962 | 0.932 | 0.895 | 0.703 | 0.455 | 0.244 | ||
15 | 200 | 0.1 | 0.263 | 0.190 | 0.154 | 0.132 | 0.092 | 0.067 | 0.050 |
0.2 | 0.737 | 0.578 | 0.467 | 0.392 | 0.245 | 0.152 | 0.093 | ||
0.5 | 1.000 | 1.000 | 0.996 | 0.987 | 0.887 | 0.642 | 0.355 | ||
0.8 | 1.000 | 1.000 | 1.000 | 1.000 | 0.999 | 0.961 | 0.719 | ||
15 | 500 | 0.1 | 0.562 | 0.408 | 0.322 | 0.269 | 0.170 | 0.111 | 0.072 |
0.2 | 0.987 | 0.932 | 0.850 | 0.766 | 0.521 | 0.313 | 0.170 | ||
0.5 | 1.000 | 1.000 | 1.000 | 1.000 | 0.999 | 0.957 | 0.709 | ||
0.8 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.980 |
For other cases as represented in Figures 3(b) and 3(c), the estimated power first stays stable at 100% for small values of variance and then decreases when the variance of the latent trait increases. This effect was observed for high values of the group effect γ. The combination of a high group effect and a low variance produces a very high standardized effect that can always be detected whatever the values of the number of items and that explains the estimated power of 100%. In these cases, as soon as the power begins to decrease (for \(\sigma ^{2}_{\theta }>1\) in Figure 3(c)), the same effects as before are observed i.e. an underestimation of the variance \(\sigma ^{2}_{\theta }\) leads to a loss of power which is the highest for small values of the variance \(\sigma ^{2}_{\theta }\) and high values of J.
Misspecification of the item difficulties
Power estimated with the Raschpower procedure for different values of the sample size per group ( N _{ g } ), the group effect ( γ ), the variance of the item distribution \(\left (\sigma ^{2}_{\delta _{j}}\right)\) and the gap between the means of the two normal distributions ( a ) when the variance of the latent variable \(\sigma ^{2}_{\theta }\) =1 and the number of items J =7
N _{ g } | γ | \({\sigma ^{2}_{\delta _{j}}}\) | a=0 | \({a=\pm \frac {1}{4}\sigma _{\delta _{j}}}\) | \({a=\pm \frac {1}{2}\sigma _{\delta _{j}}}\) | \({a=\pm \frac {3}{4}\sigma _{\delta _{j}}}\) |
---|---|---|---|---|---|---|
50 | 0.1 | 0.25 | 0.057 | 0.057 | 0.057 | 0.057 |
1 | 0.057 | 0.057 | 0.057 | 0.057 | ||
8 | 0.055 | 0.054 | 0.053 | 0.052 | ||
50 | 0.2 | 0.25 | 0.115 | 0.115 | 0.115 | 0.115 |
1 | 0.114 | 0.114 | 0.114 | 0.113 | ||
8 | 0.107 | 0.106 | 0.103 | 0.099 | ||
50 | 0.5 | 0.25 | 0.475 | 0.474 | 0.474 | 0.473 |
1 | 0.472 | 0.471 | 0.469 | 0.466 | ||
8 | 0.432 | 0.427 | 0.413 | 0.387 | ||
50 | 0.8 | 0.25 | 0.854 | 0.855 | 0.856 | 0.855 |
1 | 0.852 | 0.852 | 0.850 | 0.848 | ||
8 | 0.815 | 0.810 | 0.794 | 0.764 | ||
200 | 0.1 | 0.25 | 0.116 | 0.116 | 0.116 | 0.116 |
1 | 0.115 | 0.115 | 0.114 | 0.114 | ||
8 | 0.107 | 0.106 | 0.103 | 0.099 | ||
200 | 0.2 | 0.25 | 0.333 | 0.332 | 0.332 | 0.331 |
1 | 0.329 | 0.328 | 0.326 | 0.324 | ||
8 | 0.299 | 0.296 | 0.286 | 0.268 | ||
200 | 0.5 | 0.25 | 0.968 | 0.968 | 0.968 | 0.967 |
1 | 0.966 | 0.966 | 0.965 | 0.964 | ||
8 | 0.947 | 0.945 | 0.936 | 0.918 | ||
200 | 0.8 | 0.25 | 1.000 | 1.000 | 1.000 | 1.000 |
1 | 1.000 | 1.000 | 1.000 | 1.000 | ||
8 | 1.000 | 1.000 | 1.000 | 1.000 | ||
500 | 0.1 | 0.25 | 0.226 | 0.226 | 0.226 | 0.225 |
1 | 0.223 | 0.223 | 0.222 | 0.220 | ||
8 | 0.204 | 0.202 | 0.195 | 0.184 | ||
500 | 0.2 | 0.25 | 0.675 | 0.675 | 0.674 | 0.673 |
1 | 0.669 | 0.668 | 0.666 | 0.662 | ||
8 | 0.621 | 0.615 | 0.596 | 0.563 | ||
500 | 0.5 | 0.25 | 1.000 | 1.000 | 1.000 | 1.000 |
1 | 1.000 | 1.000 | 1.000 | 1.000 | ||
8 | 1.000 | 1.000 | 1.000 | 1.000 | ||
500 | 0.8 | 0.25 | 1.000 | 1.000 | 1.000 | 1.000 |
1 | 1.000 | 1.000 | 1.000 | 1.000 | ||
8 | 1.000 | 1.000 | 1.000 | 1.000 |
Illustrative example
The ELCCA (Etude Longitudinale des Changements psycho-économiques liés au CAncer) study is a longitudinal prospective study that enrolled breast cancer and melanoma patients and was approved by an ethical research committee (CPP) prior to being carried out in the department of onco-dermatology at Nantes University Hospital (for melanoma patients) and at Nantes Institut de Cancérologie de l’Ouest (for breast cancer patients). This study aimed at analyzing the evolution of the life satisfaction (Satisfaction With Life Scale) of patients after cancer and its interaction with the health-related quality of life (EORTC QLQ-C30), the economic situation and the disease-related psychological changes (Post-Traumatic Growth Inventory [22]) measured at different times (1, 6, 12 and 24 months after diagnosis). Positive changes after cancer experience have been highlighted in several studies on the post traumatic growth, especially regarding life priorities and relation with the others. The impact of a misspecification of the parameters on the power determination can be illustrated by to determining the a priori power of the test of group effect between breast cancer and melanoma patients regarding the dimension “relation with others” of the post-traumatic growth inventory in the ELCCA study at 6 months post-diagnosis (first period of change). The dimension “relation with others” is composed of 7 items having 6 response categories. To determine the power, the Raschpower procedure required the expected values of the following parameters: (i) the group effect, (ii) the number of items: J=7, (iii) the item parameters, (iv) the variance of the latent variable and (v) the sample size in each group (n_{0}=213 for breast cancer and n_{1}=78 for melanoma). The choice of expected values for these parameters may be tough and can be guided by a pilot study.
A priori power estimated with the Raschpower procedure from a pilot study and impact of misspecified parameters on the power ( \(1-\hat {\beta }\) )
Estimations used to determine the power with Raschpower | Estimated power | Misspecified parameters | |||
---|---|---|---|---|---|
Item parameters | Group effect | Variance of the latent variable | \({1-\hat {\beta }}\) | Item parameters | Variance |
Pilot: \(\hat {\delta }_{\textit {jpPILOT}}\) | \(\hat {\gamma }_{\textit {PILOT}}=0.1888\) | Pilot: \(\hat {\sigma }_{\textit {PILOT}}^{2}=0.7858\) | 0.3837 (a priori) | ||
ELCCA: \(\hat {\delta }_{\textit {jpELCCA}}\) | \(\hat {\gamma }_{\textit {PILOT}}=0.1888\) | Pilot: \(\hat {\sigma }_{\textit {PILOT}}^{2}=0.7858\) | 0.3771 | YES | |
Pilot: \(\hat {\delta }_{\textit {jpPILOT}}\) | \(\hat {\gamma }_{\textit {PILOT}}=0.1888\) | ELCCA: \(\hat {\sigma }_{\textit {ELCCA}}^{2}=1.0864\) | 0.3004 | YES | |
ELCCA: \(\hat {\delta }_{\textit {jpELCCA}}\) | \(\hat {\gamma }_{\textit {PILOT}}=0.1888\) | ELCCA: \(\hat {\sigma }_{\textit {ELCCA}}^{2}=1.0864\) | 0.2983 | YES | YES |
Since the ELCCA data have been collected, we can now look at the estimations of the item parameters of the ELCCA study, \(\hat {\delta }_{\textit {jpELCCA}}=\left (\begin {array}{ccccc} -0.9735 & -1.0501 & -1.7684 & -0.0987 & 2.1514\\[.5pt] -0.6494 & -0.9946 & -1.3959 & 0.7675 & 2.1610\\[.5pt] -0.2551 & -0.9686 & -0.9510 & 1.3100 & 2.4691\\[.5pt] -0.3091 & -1.2309 & -1.6587 & 0.5290 & 2.1048\\[.5pt] -0.5618 & -1.4289 & -1.2661 & 1.1758 & 2.5396\\[.5pt] -0.6131 & -1.2691 & -1.5260 & 1.0783 & 2.3768\\[.5pt] -0.9466 & -0.5453 & -1.9003 & 1.0190 & 2.7882 \end {array}\right)\). As the final item parameters estimated from ELCCA are noticeably different from the item parameters estimated from the pilot study used to determine the a priori power, we can wonder how much the power is impacted by this misspecification of the item parameters. The power determined with the final item parameters estimated from ELCCA (line 2 of Table 3) and the group effect and the variance estimated from the pilot study is equal to 37.71%. So, using the item parameters from the pilot study has led to underestimate the power by around 1%. Similarly, we can look at the estimated variance of the latent variable in ELCCA, \(\hat {\sigma }_{\textit {ELCCA}}^{2}=1.0864\). The power determined with the final variance estimated from ELCCA (line 3 of Table 3) is equal to 30.04%. So, the underestimation of the variance (0.79 instead of 1.09) has led to overestimate the power by 8%. If we now look at the combined effect of misspecifying the item parameters and the variance, the power determined with the final item parameters and variance estimated from ELCCA (line 4 of Table 3) is equal to 29.83% and is not so far from the power where only the variance was misspecified (30.04%). It is clear from this example that the misspecification of the variance of the latent variable can have a large impact on the determination of the power whereas a misspecification of the item parameters has less impact.
Eventually, the post hoc power determined with the final group effect (\({\hat {\gamma }_{\textit {ELCCA}}=-0.0408}\)), variance \({(\hat {\sigma }_{\textit {ELCCA}}^{2}= 1.0864)}\) and item parameters (\(\hat {\delta }_{\textit {jpELCCA}}\)) estimated from the ELCCA study happens to be really small (1.15%) as the group effect is near 0.
Discussion
The determination of the power of the test of group effect using Raschpower at the design stage relies on the planning expected values for the sample size in each group (N_{0} and N_{1}), the group effect (γ), the item difficulties (δ_{ j }) and the variance of the latent trait \(\left (\sigma _{\theta }^{2}\right)\). In this study, the impact of a misspecification of the item difficulties or the variance of the latent trait on the power was assessed through the comparison of the estimations of the power in different situations. It seems that a misspecification of the item difficulties regarding their overall pattern (change in a) or their dispersion (change in \(\left.\sigma ^{2}_{\delta _{j}}\right)\) has no or very little impact on the power. The parameters a and \(\sigma ^{2}_{\delta _{j}}\) characterize the equiprobable mixture of normal distributions from which the item difficulties were drawn. Their values were deliberately chosen to avoid ceiling and floor effects as the Raschpower procedure has been validated in previous work on cases where no or little ceiling and floor effects [17] are observed (when the mean of the latent variable is different from the mean of the item distribution, for similar variances). That’s why, in this study, the means of the latent variable and item distributions were equal and the different values of the variance of the item distribution \(\sigma ^{2}_{\delta _{j}}\) were limited to \(8 \times \sigma ^{2}_{\theta }\). It comes out that a misspecification of the item difficulties at design stage matters little as long as no floor or ceiling effect has been created by the misspecification.
Other distributions might have been chosen to draw the item difficulties distribution. However, it seems that the form of the distribution has very little impact on the determination of power with the Raschpower procedure. In contrast, the occurrence of floor or ceiling effects may impact the determination of the power. These effects are due to a gap between the means of the latent variable distribution and the items distribution. When these two distributions are not overlaid, some items can be too difficult or too easy for the population. The floor or ceiling effects can also results from an item distribution more spread out than the latent variable distribution where the easy items will be too easy and the difficult items will be too difficult for the population. So, the characteristics of the distribution seem to have more impact on the correct determination of the power rather than the form of the distribution. Therefore, we can expect similar results if the item parameters were drawn from a distribution having a different form but the same characteristics than the equiprobable mixture of normal distributions where no ceiling or floor effects occur.
In contrast, a misspecification of the variance of the latent variable can have a strong impact as an underestimation of the variance \(\sigma ^{2}_{\theta }\) will lead to an overestimation of the power at the design stage and may result in an underpowered study. The decrease of power between the expected power and the observed power due to an underestimation of the variance is the highest for small values of the variance \(\sigma ^{2}_{\theta }\) and high values of J. The observed decrease of power is due to the assumption that the value of the group effect was correctly specified at the design stage and that the misspecification occurred only on the variance. As a matter of fact, the increase of the variance of the latent variable \(\sigma ^{2}_{\theta }\) causes the increase of the estimated variance of the group effect \(\hat {var}\left (\hat \gamma \right)\). Hence, as the estimation of the power (equation 3) includes the ratio \(\frac {\gamma }{\sqrt {\hat {var}(\hat \gamma)}}\), an increase of \(\sigma ^{2}_{\theta }\) leads to a decrease of this ratio and eventually to a decrease of power. Furthermore, the assumption of a correct specification of the group effect also explains the observed plateau of the power at 100% for small values of \(\sigma ^{2}_{\theta }\) and high values of γ as the standardized effect \(\frac {\gamma }{\sigma _{\theta }}\) to detect is large and greater than 1.
The increase of power with the number of items, the group effect and the sample size is consistent with previous works in item reponse theory [12,13]. The good performance of the Raschpower procedure illustrated in different settings [16,18] strengthens the previous finding that methods based on marginal maximum likelihood estimations and accounting for the unreliability of the latent outcome provides adequate power in item response theory [13]. This study emphasizes the potential strong impact of misspecifying the variance of the latent variable in power and sample size determinations for PRO cross-sectional studies comparing two groups of patients. This effect of the variance is certainly not limited to the power and sample size determinations in the Rasch model or even in item response theory but also probably pertains to the sample size calculation based on observed variables. It must be noted that the expected value of variance should be cautiously chosen to compute a sample size and plan a study and carefully estimated to determine a post hoc power.
Even though this study of the impact of the misspecification of the parameters pertains to the comparison of PRO data evaluated by dichotomous items in two independent groups of patients, the Raschpower procedure was also developed for polytomous items and/or longitudinal studies [18]. We can assume that, in such settings, a misspecification of the variance may also have an impact on the estimation of the power whereas this estimation may not suffer from a misspecification of the item parameters. For longitudinal studies, the impact of a misspecification of the parameters will not only depend on the value of the variance of the latent variable σ^{2} but also on the whole covariance matrix, i.e. on the variance of the latent variable at each measurement occasion and its correlation between measurement occasions. For questionnaires composed of polytomous items, this impact will depend on the number of items and also on the number of response categories of the items.
A number of software programs or websites are useful for power analysis and sample size calculation. Some specialized programs (G*POWER, PASS, NQuery Advisor, PC-Size, PS) and some more general statistical programs (SAS, Stata, R) can provide power and sample size through the t-test based formula for the comparison of two normally distributed endpoints in two independent groups of patients. Unfortunately, this formula is not adequate in the Rasch model setting [11] and to our knowledge, the correct determination of the sample size or power for a study intended to be analysed with a Rasch model is not available on any softwares or websites. To provide an easy way to determine the sample size and power in this setting, the whole Raschpower procedure has been implemented in the Raschpower module freely available at the website PRO-online http://pro-online.univ-nantes.fr. This module determines the expected power of the test of the group effect for cross-sectional studies or the test of time effect for longitudinal studies given the expected values defined by the user. This study has exemplified the importance of the determination of the expected value of the variance of the latent variable. In order to help designing studies when a Rasch model is intended for the analysis and when the expected value of the variance of the latent variable is highly uncertain, a graphical option is also available in the Raschpower module. Given the expected values for the sample size in each group (N_{0} and N_{1}), the group effect (γ) and the item difficulties (δ_{ j }), it provides a chart similar to Figure 3 representing the expected power as a function of a range of values of the variance of the latent variable. This chart can help to make an informed choice and may avoid insufficiently powered studies.
Conclusions
This study emphasizes the potential strong impact of misspecifying the variance of the latent variable in power and sample size determinations for PRO cross-sectional studies comparing two groups of patients. A variance misspecification can lead to an overestimation of the power of the test of group effect at the design stage and may result in an underpowered study.
Declarations
Acknowledgements
This study was supported by the French National Research Agency, under reference n^{o} 2010 PRSP 008 01.
Authors’ Affiliations
References
- Swartz RJ, Schwartz C, Basch E, Cai L, Fairclough DL, McLeod L, et al. SAMSI Psychometric Program Longitudinal Assessment of Patient-Reported Outcomes Working Group. The king’s foot of patient-reported outcomes: current practices and new developments for the measurement of change. Qual Life Res. 2011; 20(8):1159–67.View ArticlePubMedPubMed CentralGoogle Scholar
- Greenhalgh J. The applications of PROs in clinical practice: what are they, do they work, and why?Qual Life Res. 2009; 18(1):115–23.View ArticlePubMedGoogle Scholar
- Willke RJ, Burke LB, Erickson P. Measuring treatment impact: a review of patient-reported outcomes and other efficacy endpoints in approved product labels. Controlled Clin Trials. 2004; 25(6):535–52.View ArticlePubMedGoogle Scholar
- Thomas ML. The value of item response theory in clinical assessment: a review. Assessment. 2011; 18(3):291–307.View ArticlePubMedGoogle Scholar
- de Bock E, Hardouin J-B, Blanchin M, Le Neel T, Kubis G, Sébille V. Assessment of score- and rasch-based methods for group comparison of longitudinal patient-reported outcomes with intermittent missing data (informative and non-informative). Qual Life Res. 2015; 24(1):19–29.View ArticlePubMedGoogle Scholar
- Nguyen TH, Han H-R, Kim MT, Chan KS. An introduction to item response theory for patient-reported outcome measurement. Patient. 2014; 7(1):23–35.View ArticlePubMedPubMed CentralGoogle Scholar
- Reeve BB, Hays RD, Chang C, Perfetto EM. Applying item response theory to enhance health outcomes assessment. Qual Life Res. 2007; 16(S1):1–3.View ArticleGoogle Scholar
- Calvert M, Blazeby J, Altman DG, Revicki DA, Moher D, Brundage MD, et al. Reporting of patient-reported outcomes in randomized trials: the CONSORT PRO extension. J Am Med Assoc. 2013; 309(8):814–22.View ArticleGoogle Scholar
- Brundage M, Blazeby J, Revicki D, Bass B, de Vet H, Duffy H, et al. Patient-reported outcomes in randomized clinical trials: development of ISOQOL reporting standards. Qual Life Res. 2013; 22(6):1161–75.View ArticlePubMedGoogle Scholar
- Revicki DA, Erickson PA, Sloan JA, Dueck A, Guess H, Santanello NC. Interpreting and reporting results based on patient-reported outcomes. Value Health. 2007; 10 Supplement 2:116–24.View ArticleGoogle Scholar
- Sébille V, Hardouin J-B, Le Néel T, Kubis G, Boyer F, Guillemin F, et al. Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients–a simulation study. BMC Med Res Methodology. 2010; 10:24.View ArticleGoogle Scholar
- Holman R, Glas CAW, de Haan RJ. Power analysis in randomized clinical trials based on item response theory. Controlled Clin Trials. 2003; 24(4):390–410.View ArticlePubMedGoogle Scholar
- Glas CAW, Geerlings H, van de Laar MAFJ, Taal E. Analysis of longitudinal randomized clinical trials using item response models. Contemporary Clin Trials. 2009; 30(2):158–70.View ArticleGoogle Scholar
- Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: University of Chicago Press; 1980.Google Scholar
- Fischer GH, Molenaar IW. Rasch Models: Foundations, Recent Developments, and Applications. New York: Springer; 1995.View ArticleGoogle Scholar
- Hardouin J-B, Amri S, Feddag M-L, Sébille V. Towards power and sample size calculations for the comparison of two groups of patients with item response theory models. Stat Med. 2012; 31(11-12):1277–90.View ArticlePubMedGoogle Scholar
- Blanchin M, Hardouin J-B, Guillemin F, Falissard B, Sébille V. Power and sample size determination for the group comparison of patient-reported outcomes with rasch family models. PLoS ONE. 2013; 8(2):57279.View ArticleGoogle Scholar
- Feddag M-L, Blanchin M, Hardouin J-B, Sébille V. Power analysis on the time effect for the longitudinal rasch model. J Appl Meas. 2014; 15(3):292–301.PubMedGoogle Scholar
- Feddag M-L, Sébille V, Blanchin M, Hardouin J-B, Estimation of parameters of the rasch model and comparison of groups in presence of locally dependent items. J Appl Meas. in press. 2014.Google Scholar
- Guilleux A, Blanchin M, Hardouin J-B, Sébille V. Power and sample size determination in the rasch model: evaluation of the robustness of a numerical method to non-normality of the latent trait. PLoS ONE. 2014; 9(1):83652.View ArticleGoogle Scholar
- Bock RD, Aitkin M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika. 1981; 46(4):443–59.View ArticleGoogle Scholar
- Tedeschi RG, Calhoun LG. The posttraumatic growth inventory: measuring the positive legacy of trauma. J Traumatic Stress. 1996; 9(3):455–71.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.