We re-analysed a data set originally collected to investigate the prognostic validity of different approaches for measuring change [16]. The original study had been motivated by the decision to use direct measurements of change in the quality-assurance programme for medical rehabilitation clinics under the purview of German statutory pension funds [17].
Sample
Five rehabilitation clinics located in the German federal state of Schleswig-Holstein recruited study participants in 1999 (August to November) using the following inclusion criteria: (a) between 18–60 years old, (b) German speaking, (c) participating in a rehabilitation programme for either a musculoskeletal (ICD-9 710 to 739.9) or cardiovascular disease (ICD-9 393 to 429.9) at one of the five cooperating clinics.
Four hundred and twenty-six patients gave written, informed consent to participating in the study. They filled out a self-administered questionnaire both pre (before) treatment (t0; responding: n = 426, 100%) and post treatment (t1; responding: n = 397, 93.2%). At t1, all participants were randomised and asked to fill out one of two additional questionnaires, which were either designed to measure change directly (transition ratings) or quasi-indirectly (the “then-test” approach). In each clinic, participants were randomly allocated 1:1 either to group 1 (reporting change directly; responding: n = 194) or group 2 (reporting their pre status retrospectively; responding: n = 201). The standard duration of rehabilitation was three weeks, which represents the difference between t0 and t1. Figure 2 illustrates the study design.
The original study also included additional measurement points at follow-ups 6 and 12 months after t0 for the purpose of analysing predictive validity of the three different approaches of change measurements. These results are not part of the present analysis.
Outcomes
Questionnaires at t0 and t1 gathered information on patients’ subjective health status (general health status, sleep, concentration, vitality, symptom checklist, pain, social functioning and physical functioning [18–21]). At t0, we assessed patients’ socio-demographic profile (age, sex, education, citizenship, marital status, net income), socio-medical characteristics (e.g. health insurance status, pension fund, healthcare utilisation, or any severe disabilities or disabilities currently preventing them from working), physical activities, risk factors alcohol/nicotine consumption, medications, height, and weight. We analysed 1) the four-item “sleep function” subscale of the IRES (Indicators of Rehabilitation Status; six response categories), which is a generic health-related quality-of-life measure widely used in German rehabilitation research and quality-assurance programs [18, 19], 2) the ten-item “physical functioning index” subscale of the Short Form 36 (SF-36; three response categories) [20], and 3) the 12-item “somatisation” subscale of the Symptom Checklist 90-R (SCL-90-R; four response categories) [21]. These three scales were selected for clinical and psychometric reasons. Musculoskeletal and cardio-vascular diseases often involve somatisation, functional impairments, and insomnia [22, 23]. The selected scales are reliable, valid and well-established for the assessment of subjective health of patients with musculoskeletal and cardiovascular diseases. These scales are included in the patient questionnaire used in the quality-assurance programme for medical rehabilitation clinics under the purview of German statutory pension funds [17].
Our re-analysis focused on these three scales because they were the only ones from the original study to apply all three methods of change measurement using the same number of items and featuring equivalent item content. An item of the sleep scale concerning disturbed sleep provides an illustrative example. Patients were asked about the extent to which their sleep was disturbed both before (t0) and after (t1) rehabilitation. At t1, they were also asked either how their problem they possibly had with their sleep being disturbed had changed (direct measurement of change) or to rate the extent to which their sleep had been disturbed at t0 (retrospective pre or then-test).
Analysis
The differences in sample characteristics between the two randomized groups were analysed by means of χ
2-tests and t-tests for independent samples, depending on their scale of measure.
In order to base all analyses of change on the same data, we included only those patients in the analyses who had provided valid data on the pre- and post-status scores in addition to providing either a retrospective pre score or a score for direct measurement of change for each of the three subscales (IRES sleep subscale, SF-36 physical functioning scale, SCL-90-R somatisation scale).
Three different change scores were calculated for each scale (“sleep function”, “physical functioning” and “somatisation”): The change scores for the indirect measures of change were calculated by subtracting the pre scale score at t0 from the post scale score at t1 (post − pre). The quasi-indirect measures of change were calculated by subtracting the retrospective pre-scale score referring to t0 from the post scale score at t1 (post − retrospective pre).
For each item, the response format for the direct measures of change comprised five categories (1 - markedly better, 2 - slightly better, 3 – no change, 4 - slightly worse, 5 - markedly worse). We first calculated the mean of the single-item ratings that belong to one of the three outcome scales (sleep, physical functioning, somatization). This means that the resulting score in direct measures of change is not a single item rating, as it is often used in transition ratings, but is based on the same number of items as the score calculated in indirect or quasi-indirect measures of change. Then we transformed this mean score by subtracting 3, yielding a score that ranged from −2 (worst change possible) to +2 (best change possible). This direct-change score thus has a theoretical range of four scale points and is centred around 0 (no change). The reliability of the status measurements, retrospective pre scores (then-test) and scores for direct measures of change were calculated using Cronbach’s alpha.
The effect size of the change for the direct change measurement (transition rating) was calculated by dividing the mean change-score by its standard deviation. Effect sizes for the indirect and quasi-indirect measures of change were calculated as standardised response means ((Mt1 − Mt0)/SDdiff t1-t0) [24]. In theory, the standard deviation of the transition ratings should represent a standard deviation of a change score. Therefore the standardized response mean that uses the standard deviation of the difference between the scores assessed at of two time points as a denominator should be the most suitable equivalent of the effect size calculated for the transition ratings.
The level of agreement between indirect and quasi-indirect as well as direct measures of change (question 1) was calculated by Pearson product–moment correlation coefficients. The status measures on which the indirect and quasi-indirect measures of change were based were on the same scale. The scale of direct measures of change was different from the scales of indirect and quasi-indirect measures of change. Therefore, we calculated the intra-class correlation coefficient (ICC) between pre test and post test measure used for indirect and quasi-indirect measures of change to analyse the level of absolute agreement of both scales, in addition to the Pearson product–moment correlation coefficient. This was not suitable for levels of agreement or direct measures of change with the other measures of change.
The degree of recall bias (question 2) was estimated using the correlation between the score at t0 and the retrospective pre score assessed at t1 (then-test). A correlation coefficient with a value near the reliability of the two assessments indicates a low recall bias.
The present-state effect (question 3) was analysed according to the approach used by Guyatt et al. [7]. We calculated the correlation between the pre measures and their corresponding transition-rating scores as well as the post measures and their corresponding transition-rating scores. Each transition-rating score was then used as a dependent variable in a linear regression model. We entered the post scores into the regression model first, and then entered the corresponding pre scores subsequently. This procedure allowed us to determine what percentage of variance was explained by the post scores alone and what additional percentage could then be explained using the pre scores. A beta coefficient that is larger for the post score than for the pre score indicates a present-state effect. If a pre score accounts for a substantial amount of variance, it indicates that the status at t1 (the “present state”) does not override the information of the pre status of the patient at t0 which is necessary to make a sound judgement of change.