Research article | Open | Open Peer Review | Published:
Minimal important differences for fatigue patient reported outcome measures—a systematic review
BMC Medical Research Methodologyvolume 16, Article number: 62 (2016)
Fatigue is the most frequent symptom reported by patients with chronic illnesses. As a subjective experience, fatigue is commonly assessed with patient-reported outcome measures (PROMs). Currently, there are more than 40 generic and disease-specific PROMs for assessing fatigue in use today. The interpretation of changes in PROM scores may be enhanced by estimates of the so-called minimal important difference (MID). MIDs are not fixed attributes of PROMs but rather vary in relation to estimation method, clinical and demographic characteristics of the study group, etc. The purpose of this paper is to compile published MIDs for fatigue PROMs, spanning diagnostic/patient groups and estimation methods, and to provide information relevant for appraising their appropriateness for use in specific clinical trials and in monitoring fatigue in defined patient groups in routine clinical practice.
A systematic search of three databases (Scopus, CINAHL and Cochrane) for studies published between January 2000 to April 2015 using fatigue and variations of the term MID, e.g. MCID, MIC, etc. Two authors screened search hits and extracted data independently. Data regarding MIDs, anchors used and study designs were compiled in tables.
Included studies (n = 41) reported 60 studies or substudies estimating MID for 28 fatigue scales, subscales or single item measures in a variety of diagnostic groups and study designs. All studies used anchor-based methods, 21/60 measures also included distribution-based methods and 17/60 used triangulation of methods. Both similarities and dissimilarities were seen within the MIDs.
Magnitudes of published MIDs for fatigue PROMs vary considerably. Information about the derivation of fatigue MIDs is needed to evaluate their applicability and suitability for use in clinical practice and research.
Fatigue is among the most frequent complaints reported by patients with chronic illnesses [1–4] and has far-ranging, often debilitating consequences on their wellbeing and physical, emotional and social functioning . Although there is no consensus definition of fatigue, it is often described as ‘a persistent, overwhelming sense of tiredness, weakness or exhaustion resulting in a decreased capacity for physical and⁄or mental work . Fatigue is a subjective experience and is commonly assessed by means of patient-reported outcome measures (PROMs). PROMs are widely used today in evaluating the effects of illness and treatment on symptoms, functioning, and other outcomes from the patient’s perspective .
Currently, there are some 40 generic and disease-specific PROMs for assessing fatigue in use today . Most of these fatigue measures have been evaluated regarding various aspects of validity and reliability. Although these are important psychometric properties reflecting the quality of the measure, they are of little value in interpreting the meaning of scores derived from that measure . Nonetheless, interpretation of scores, in particular changes in scores, is of critical concern in trials evaluating effects of treatments aimed at reducing fatigue, as well as in routine clinical practice in monitoring and managing fatigue in individual patients. In clinical trials, it has long been recognized that conventional statistical significance testing provides information regarding the probability that an effect exists, not about the meaningfulness of the size of the effect . In clinical practice, difficulties in evaluating and interpreting changes in PROM scores often impinge on their usefulness in informing clinical decision-making .
The interpretation of changes in PROM scores may be enhanced by estimates of the so-called minimal important difference (MID). MID was originally defined over 25 years ago as “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management” . During the past decades considerable research attention has been directed towards deriving MIDs for PROMS. In this pursuit a variety of methods have been developed and applied, but no clear consensus exists regarding which method or methods are most suitable.
To date, two main methods have been applied, namely anchor-based approaches and distribution-based approaches. Descriptions of these methods are beyond the scope of this paper and are summarized in detail elsewhere . Briefly, anchor-based approaches use various external criteria (patient-reported, physician-reported, or clinical anchors) to interpret whether a particular magnitude of change is important. For example, a common anchor-based method involves the use of global rating scales (GRS) where MIDs are derived by comparing patients’ self-ratings of change (e.g., “much worse”—“much better”) to change in PROM scores. The MID is often defined as lying within the range of “slightly worse/better” on the GRS . Distribution-based approaches rely on the statistical characteristics of the distribution of scores in the sample, in which the magnitude of change is generally expressed as a function of the standard deviation (SD) of scores alone or in combination with the reliability of the PROM (standard error of the measurement (SEM)) . Various SD and SEM cut-off values have been proposed for estimating MIDs, including ½ or 1/3 SD and 1–2 SEM. Another commonly applied method is the use of effect sizes (ES) or standardized response means (SRM), where change scores are divided by the SD at baseline or the SD of change, respectively. The MID is often defined as change values lying within the range of 0.2-0.5. A disadvantage to distribution-based approaches is that they do not address the clinical importance of the change. Recent recommendations have proposed that as a first-line method multiple anchor-based approaches should be used, which, supported by distribution-based methods, may be triangulated to a single MID value or smaller range of values [14–17].
Although appealing for its simplicity, the idea of a single, universal MID value for any particular PROM remains elusive for a number of reasons. Firstly, different MID estimation approaches have been shown to yield highly disparate MIDs and hence triangulation (combining different methods to estimate a MID) may be problematic . Secondly, MIDs have also been shown to differ by population and context . For example, MIDs vary by diagnostic group, characteristics of the study sample, e.g., demographics and baseline levels; disease severity; treatment; choice of anchors [18, 19] as well as if MIDs gauge improvement versus deterioration . This variability suggests the need to understand how a particular MID value was determined in order to judge its appropriateness for use in research for interpreting change and/or computing sample sizes, or in clinical practice for monitoring fatigue in specific patient groups .
The purpose of this paper is to compile published MIDs for fatigue PROMs, spanning diagnostic/patient groups and estimation methods, and to provide information relevant for appraising their appropriateness for use in specific clinical trials and in monitoring fatigue in defined patient groups in routine clinical practice.
A systematic literature review where three databases (Scopus, CINAHL and Cochrane) were searched from January 2000 to April 2015 to identify studies with calculated MIDs in fatigue scales, subscales and single item measures. The searches were limited to English language (search string: “minimal clinical important difference*” OR “minimal important difference*” OR “minimal clinically important difference*” OR “minimally important difference*” OR “clinical important improvement*” OR “clinically important improvement*” OR “minimal important clinical difference*” OR “minimally important clinical difference*” OR ”responder definition”) AND Fatigue). The search was augmented with screening of article reference lists. All expressions including “difference/change/improvement” or equivalent, “important” as well as “minimal” or “clinical”, or “responder definition” were defined as MIDs. To facilitate the reading all minimally important changes are called MIDs in this paper.
Selection of articles
Inclusion criteria were reporting MIDs in text and/or tables for a fatigue scale, subscale or single item measurement of fatigue. Exclusion criteria were: reported MID was not derived directly in the study; insufficient information supplied about the study sample, study design and/or method for determining the MID; study sample < 18 years, not separate reporting of MIDs for a fatigue subscale and conference abstracts. Exclusion on title/abstract and on full-text levels were done independently by two researchers (ÅN and AD), see Fig. 1.
Two authors (ÅN and AD) extracted data regarding MIDs and methods used, including anchors used. The last author (AD) checked all data extraction and prepared the tables. To facilitate interpretation all MIDs are shown as absolute values and decimals are restricted to one significant number only, except for effect sizes. Some studies reported standard deviation (SD) and confidence intervals but these are not shown in our tables or text. The fatigue measurements were identified as multidimensional scales, unidimensional scales or subscales, single item measurement or item bank scales.
The literature search generated 177 articles (Fig. 1), of which 41 met the inclusion criteria [22–62]. The main reasons for exclusion were: reported MID was not derived in the study; and inadequate information was supplied about the study sample, study design and/or method for determining the MID. Many different expressions were used to name a small but important change in fatigue . In this review we included studies using different phrases for MID (see Table 1), e.g. “MID”, “MCID”, “MCII” or an equivalent expression, all referred to as MID in this paper. Most of these expressions used some variation of “difference/change/improvement” or equivalent, “important” as well as “minimal”. Some phrases also included “clinical”. Two studies used “responder definition” [43, 55], see Table 1. In two systematic reviews a phrase without “minimal” was used [59, 60] but the authors defined values for a small or minimal change.
The included articles (n = 41) reported MIDs for 28 fatigue PROMs (characteristics shown in Table 2), resulting in 60 studies/substudies of MIDs. The studies varied in sample size, diagnostic group, MID estimation approach, study design, type of intervention and length of follow up. Sample sizes ranged from n = 40 to n = 2,583. Sixteen different diagnoses were included in the reviewed studies. Twenty-seven of the studies in the 41 articles were longitudinal and follow-up periods ranged from two days after intervention to one year after baseline. An anchor-based approach alone was used in 39 of the 60 studies or substudies estimating MID, while the rest also used a distribution-based approach. Seventeen of these also included a method of triangulation to define MIDs. Two cross-sectional studies [33, 46] reported MIDs for seven fatigue or vitality scales (MFI, FSS, MAF, CFS, VT/SF-36, FACIT-F and GRS). Other studies determined MIDs for two or more fatigue measures or subscales [28, 47, 48, 51, 59–61]. Several PROMs had MIDs determined in a number of different studies and several studies reported MIDs for up to seven PROMs. Nevertheless, most MIDs were derived in single studies, with one study per PROM [22–27, 29–32, 34–43, 45, 49, 50, 52–58, 62], see Table 3. Altogether, 60 studies or substudies estimating MIDs for global change (not specified direction of change), improvement and/or deterioration are described in Table 3. In Table 3 all score changes are presented as positive values, regardless of the direction of change. Confidence intervals and SDs (if derived in study) are not shown. Numbers are rounded to one decimal place.
Multidimensional fatigue inventory (MFI), score 20–100
Two cross-sectional studies [33, 46] derived MIDs for systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) populations for the MFI total scale, using a patient global rating scale and interviews as anchors. MIDs ranged from 11.5 to 13.3 for global change and 6.8 to 9.6 for improvement and 9.5 and 12.8 for deterioration.
Fatigue severity scale (FSS), score 1–7
Three cross-sectional studies reporting MIDs for the FSS were identified [33, 46, 50]. Diagnostic groups included SLE, RA and multiple sclerosis (MS). Anchor-based approaches were applied in all the three studies and a distribution-based approach (viz. effect size, ES, of at least 0.25) was also applied in one . Two used a patient global rating scale as an anchor [33, 46] whereas the third used clinical anchors and baseline data from a clinical trial to establish MIDs  MIDs ranged from 0.5 to 1.2 for global change, 0.08 to 0.4 for improvement and 1.0 to 1.2 for deterioration.
Multidimensional assessment of fatigue (MAF), score 1–50
MID-estimates for the MAF in two cross-sectional studies with SLE and RA patients [33, 46] were estimated to 5.0 and 9.2 for global change, 1.4 to 5.4 for improvement and 8.3 to 8.9 for worsening, using a patient global rating scale.
Chalder fatigue scale (CFS), score 0–33
Fatigue impact scale (FIS), score 0–160
One cross-sectional study with MS patients  reported MIDs for the FIS ranging from 9–24 points for the different patient and clinician rating anchors, with a mean of 15.5 and SD 4.9. Distribution-based methods yielded MIDs ranging between 4.8–17.3 (1–2 SEM; ± 1/3-1/2 SD). Triangulation of anchor and distribution-based methods gave a MID range of 10–20 points.
Trial outcome index-fatigue (TOI-F), score 0–108
One study  reported TOI-F MIDs using data from three separate cancer trials. Triangulation was used to determine a MID, combining a patient-reported anchor, two physician-reported anchors (including response to treatment ratings), and one clinical anchor (haemoglobin level). MID estimates ranged from 4.8 to 26.6, and a single triangulated MID of 5.0 was recommended.
Perform questionnaire (PQ), score 12–60
One longitudinal study  estimated the PQ MID in cancer patients to be 3.7 for improvement. Triangulation was used to estimate a recommended MID of 3.5.
Schwartz cancer fatigue scale (SCFS), score 3–30
A longitudinal study of the SCFS using a patient-rated anchor  reported MIDs for global change was 5.0; for improvement 2.1; and for deterioration 5.7 after a two days follow-up.
Fatigue associated with depression questionnaire (FAsD), score 1–5
MIDs for the FAsD were estimated in one longitudinal study  of patients with a clinical diagnosis of depression ranging from 0.3 to 0.6 for improvement and 0.2–0.3 for worsening after 6 weeks follow-up.
Neurological fatigue index for multiple sclerosis (NFI-MS), summary score 0–30
One longitudinal study  using a patient global assessment of change reported MIDs for the NFI-MS; 2.5 for the ten-item Summary scale, 2.4 for the Physical scale (score range 0–24) and 0.8 for the Cognitive scale (score range 0–12).
Unidimensional scales or subscales
Multidimensional fatigue inventory (MFI) subscales, score 4–20
A longitudinal study  derived MIDs in a cancer population (pre and post radiotherapy) for the MFI five subscales. MIDs ranged between 1.4 to 2.4 depending on subscale. A general MID for all MFI subscales was recommended corresponding to 2 points.
Unidimensional fatigue impact scale (U-FIS), score 0–66
One longitudinal study using EQ5D as an anchor  derived MIDs in an MS sample. U-FIS MIDs corresponded to 6.5 for improvement and 4.7 for deterioration, and distribution-based MIDs between 2.4 and 7.0.
Fatigue assessment scale (FAS), score 10–50
MIDs for the FAS were reported in one longitudinal study of sarcoidosis patients using WHOQOL-BREF/Physical health domain and a ROC-curve as anchors as well as distribution based methods . MID ranged between 3.0 and 4.2 and a triangulated MID-value of 4 was suggested.
Vitality scale (VT) of the medical outcome study SF-36 health survey (SF-36), score 0–100
Eight studies [26, 33, 35, 46, 54, 56, 59, 60] determined MIDs for the VT scale of the SF-36 using different designs and diagnostic groups; longitudinal with patient- and/or clinician rated anchors, cross-sectional using patient-rated anchors and systematic reviews using combined study data and expert panels. The MIDs ranged from 7.3 to 11.3 for improvement, 11.9 to 18.3 for worsening and 3.5 to 20, for all those with a global change and 4.2 to 18.8 for a triangulated MID.
FACIT fatigue scale (FACIT-Fatigue), score 0–52
Six cross-sectional or longitudinal studies [28, 29, 33, 38, 46, 48] reported MID estimates derived in patients with cancer, SLE, or RA using patient or clinician-rated anchors. In these studies, MIDs varied from 3 to 8.3 irrespective of direction of change, 2.8 to 6.8 for improvement and 5.2 to 9.1 for deterioration. Two of the studies [29, 38] combined various distribution-based approaches (SEM, SD and ES), resulting in MIDs ranging between 2.2 and 6.8, and presented triangulated MIDs ranging between 3 and 6.
FACT-an fatigue subscale (FACT-An Fatigue), score 0–80
One longitudinal study  estimated a MID for improvement of 4.2 in cancer patients using haemoglobin level as a clinical anchor and regression analysis to calculate MID.
Profile of mood states short form fatigue subscale (POMS-F), score 0–28
One longitudinal study reported MIDs for the POMS-F using a sample of cancer patients undergoing chemotherapy . A global MID of 5.6 points was determined as well as separate MIDs for improvement (2.1 points) and deterioration (5.7 points).
European organization for research and treatment of cancer quality of life questionnaire core 30 (EORTC QLQ-30)—fatigue scale, score 0–100
Six cross-sectional and longitudinal studies [24, 25, 36, 40, 41, 62] reported MIDs derived in a variety of cancer diagnoses. MIDs were reported as 11.4 to 17.3 points for improvement and 5.7–24.5 points for deterioration. Distribution-based MIDs ranged from 3.0 to 19.7.
Sleep impact scale (SIS), energy/fatigue and mental fatigue subscales, score 0–100
One longitudinal study  using a clinician-rated anchor and a distribution-based method to assess change at 8-week follow-up, reported MIDs derived in patients with major depressive disorder (MDD). The anchor-based approach yielded a MID of 11.9 for the Energy/Fatigue subscale, whereas the distribution-based MID was 8.7. The corresponding MIDs for the Mental Fatigue subscales were 13.3 and 10.6, respectively.
Chronic respiratory questionnaire (CRQ), score 1–7
Two systematic reviews [52, 59] used CRQ data from earlier studies to determine MIDs for the CRQ/Fatigue subscale and triangulated MIDs of 0.5 and 2 were proposed. One of the reviews estimated MIDs between 0.5–0.6 for global change and distribution-based MIDs of 0.47–0.54 .
Chronic heart failure questionnaire (CHQ), score 4–28
One systematic review using CHQ data and an expert panel proposed a MID for the CHQ/Fatigue subscale of 3–4 irrespective of direction and a triangulated MID of 3 .
Quality of life inventory in Epilepsy (QOLIE-31), energy/fatigue subscale, score 0–100
One longitudinal study used 3 randomised controlled trials to examine MID for the QOLIE-31/Energy/fatigue subscale . A MID of 7.5 was defined using a patient rating of change and regression analysis. Distribution-based MIDs ranged between 5.4 and 9.4.
Visual analogue scale (VAS), score 0–100 or 0–10
Six longitudinal studies [30, 32, 37, 53, 57, 58] derived MIDs for the VAS 0–100 and one  for the VAS 0–10 in a variety of diagnostic groups. MIDs for the VAS-100 ranged from 1.4 to 13.9 for improvement and 3.6 to 15.2 for deterioration, while the global change varied between 6.7 and 17. One study  determined a triangulated MID of 10 using the Delphi method. MIDs for the VAS-10 ranged between 0.8 to 1.1 for improvement and 1.1 to 1.3 for worsening, and were derived from three different anchors and at different follow-up times in three different diagnostic groups (RA, SLE and cancer) .
Global rating scale (GRS), score 0–10
MIDs for the single item GRS scale were determined in SLE, RA and cancer patients in two cross-sectional studies [33, 46] and one longitudinal study , all using a patient global rating scale as an anchor. Global MIDs ranged from 1.1 to 2.0, while MIDs for improvement were 0.3 to 0.9 and for deterioration 1.5.
Edmonton symptom assessment system (ESAS) fatigue item, score 0–10
Two longitudinal cancer studies [23, 48] identified MIDs for the fatigue item in the ESAS scale. MIDs for improvement ranged from 0.1 to 4 and between 1.0 and 1.8 for worsening of fatigue. Distribution-based MIDs ranged from 0.1 to 1.4.
Immune thrombocytopenic Purpura—Patient assessment questionnaire, (ITP-PAC) fatigue subscale, score 0–100
One longitudinal study  assessed MIDs using patient impression of change for the ITP-PAC/Fatigue subscale. Global change was defined as 15.0 or as an effect size of 0.57.
PROMIS fatigue item bank scales
17-item PROMIS fatigue (fatigue-17) and 7-item PROMIS Fatigue (Fatigue-7), score 17–85 and 7–35
One study  derived MIDs for both the PROMIS Fatigue-17 and Fatigue-7 in patients with cancer. The study used both cross-sectional and longitudinal data as well as anchor-based and distribution-based methods. Distribution-based MIDs were reported as effect sizes. For the Fatigue-17, the ES ranged from 0.34–0.79 and 0.27–0.52 for cross-sectional and longitudinal designs, respectively. Corresponding effect sizes for the Fatigue-7 were 0.24–0.76 and 0.24–0.51. Triangulated raw score MIDs ranged from 4.0 to 8.0 for the Fatigue-17 and 2.0 to 3.0 for the Fatigue-7 while t-score MIDs varied between 2.5 to 4.5 for the Fatigue-17 and 3.0 to 5.0 for the Fatigue-7.
This systematic review identified 41 studies reporting MIDs for 28 fatigue PROMs or subscales measuring fatigue, yielding a total of 60 studies or substudies estimating MID. It is important to note that there are many more fatigue PROMs available today than the 28 reported here. For example, a critical review of fatigue PROMs from 2009  identified 39 such PROMs; however, only 11 of these overlapped with PROMs in our review. This suggests that there are roughly 56 or more fatigue PROMs currently represented in the literature. Considering the importance attributed to MIDs for interpreting the meaningfulness of change in PROM scores [9, 63], it is somewhat surprising that MIDs are available for only about half of all published fatigue PROMs. Moreover, few PROMs had MIDs that were determined in more than two studies and diagnostic groups, and more than half of the PROMs had MIDs that were derived in only one diagnostic group. Important exceptions were the SF-36 Vitality scale (>8 diagnostic groups/8 studies; the FACIT-Fatigue scale (4 diagnoses/6 studies); the EORTC QLQ-C30 Fatigue subscale (6 cancer diagnoses/6 studies); and the VAS-100 Fatigue (6 diagnoses/6 studies). Considering that these scales are some of the most widely used and oldest PROMs in use today it is unsurprising that greater research attention has focused on determining MIDs for these scales; however, it is noteworthy that so few separate studies reported MIDs for commonly used generic fatigue PROMs, such as the MFI, FSS, FIS and FAS.
Previous research has highlighted considerable variability in MID values as a function of estimation method, population and context [14, 18, 19], suggesting the importance of considering such factors when appraising the appropriateness of published MIDs for use in clinical research and practice. In line with this, substantial variation was observed in MID values for individual fatigue PROMs in this review. For example, MIDs for the SF-36 Fatigue scale ranged from as low as 4.2 to as high as 20.0 points (0–100 point scale) in studies varying in methodologies, anchors, diagnostic groups and direction of change assessed. Similarly, MIDs for the VAS-100 Fatigue scale ranged from 1.4 to 17. MIDs for the cancer-specific EORTC QLQ-C30 fatigue scale also varied between 1.8 and 24.5 points (0–100 scale) and those for the FACIT-Fatigue scale ranged between 6 and 16 (converted to percent), see Table 3. This wide variation in MIDs for individual fatigue scales suggests the importance of understanding how any particular MID was derived and of applying this knowledge when appraising its appropriateness for interpreting changes in fatigue scores.
MID estimation methods varied considerably in the identified studies and substudies. However, in accordance with recent recommendations regarding methods for MID estimation , nearly all studies applied an anchor-based approach, where at least one anchor was used. Patient global change ratings were by far the most common anchor, but even clinician-reported and clinical anchors were implemented. Where more than one anchor was applied either a range of values was generally reported or, as recommended [14, 63, 64], values were often triangulated to a single or smaller range of MIDs. Distribution-based methods were used in about a third of the studies and only in conjunction with anchor-based approaches. A few studies used a Delphi method (Table 3).
In the studies using several anchors to determine MID values, global MID ranges varied within single studies from as little as two points (percent scores), in relation to the FACIT-Fatigue scale using patient-based anchors , to about 20 points for the TOI-F  using patient, clinician and clinical anchors. Interestingly, two studies reporting MIDs for the SF-36 Vitality scale, using the same diagnostic group (RA) but different anchors, yielded two distinct ranges of MIDs. In the study by Kosinski et al. , using patient and physician global assessments as anchors, MIDs ranged from 4.9–11.1, whereas a range of 11.0–20.0 was reported by Ward et al.  using the HAQ, CES-D and the SF-36 health transition item. Neither of these studies triangulated the range of values to a single MID or smaller range of values and hence these wide ranges of MIDs are arguably of questionable practical value for interpreting change in fatigue in RA patients as measured with the SF-36 Vitality.
Triangulation was used in 17 substudies, of which 10 used more than two anchors. This method has been recommended for consolidating MIDs derived from different methods to a single or small range of MID values . However, it has been criticized  since it may in practice involve the need to converge widely disparate MIDs derived using different estimation methods and diverse anchors, which often represent very different stakeholder perspectives. An example of a MID triangulated from a wide range of MIDs is the TOI-F  where a MID range of 4.4–24.6 (percent scores) was triangulated to 5.0. Where MID ranges are smaller, the value and applicability of the triangulated MID may be more immediately apparent. For example, Schünemann et al.  reported a MID range for the CRQ of 6.7–8.5 (percent scores), derived from patient anchors, a systematic review and distribution-based methods, which was triangulated to a MID of 6.7.
A second factor known to influence variation in MID values is the patient population in which the MID is determined. Variation by diagnostic group is exemplified by comparing MIDS from two studies, each using the same estimation method (7-step global rating scale) and study design (cross-sectional) but different diagnostic groups [33, 46]. One of the studies  determined MIDs for seven different fatigue PROMs in patients with SLE and the other  did the same in patients with RA. Comparison of the global MIDs for the SLE and RA patients, shown in Table 3, shows consistently smaller MIDs for SLE versus RA across all seven PROMs. It is noteworthy that most PROMs had MIDs that were determined in only one patient population and the relevance of these MIDs for use in other patient groups thus remains unclear.
A third factor influencing variation in MID values is the context within which the MID is determined. Context issues concern, for example, characteristics of the patient population, e.g., such as baseline state , disease severity , and direction of change [13, 20], as well as study design and intervention. For example, patients with baseline scores indicating more severe fatigue may value magnitudes of change in fatigue differently than those with less severe fatigue. Corroborating previous research finding, MIDs for improvement differed from those for deterioration in all identified studies. MIDs tended to be larger for deterioration than improvement, except in the EORTC QLQ-30 and VAS Fatigue item. MIDs for improvement were consistently smaller than global MIDs.
A strength of this study is that reported MIDs for fatigue scales or subscales were systematically compiled and described. Assessment for inclusion or exclusion and data extraction from included studies was done independently by two authors (ÅN and AD). A limitation is that the search period was restricted to studies from 2000 onwards and search strings for the many variations on MID was also limited and therefore some studies reporting MIDs for fatigue scales may not have been captured in the literature searches. Another limitation is that the description of the study designs and results had to be summarized and simplified in tables and information could be lost. Therefore, when evaluating MIDs the original study/studies should be consulted.
MIDs vary substantially by estimation method, patient population and context both across and within fatigue PROMs. In light of this variation, published MIDs should be applied judiciously, after carefully considering their applicability to characteristics of the study in question. The information provided in this paper may serve to aid researchers and clinicians in making informed decisions regarding the appropriateness of published MIDs for their particular study and patients.
ES, effect size; GRS, global rating scale; ITP, immune thrombocytopenic purpura; MCID, minimal clinical important difference; MCII, minimal clinically important improvement; MDD, major depressive disorder; MID, minimal important difference; MS, multiple sclerosis; PRO, patient reported outcome; PROM, patient reported outcome measure; PROMIS, patient-reported outcomes measurement information system; RA, rheumatoid arthritis; SD, standard deviation; SEM, standard error of measurement; SLE, systemic lupus erythematosus; QoL, quality of life.
Ream E, Richardson A. Fatigue: a concept analysis. Int J Nurs Stud. 1996;33(5):519–29.
Fisk JD, Pontefract A, Ritvo PG, Archibald CJ, Murray TJ. The impact of fatigue on patients with multiple sclerosis. Canadian J Neurological Sciences. 1994;21(1):9–14.
Pepper CM, Krupp LB, Friedberg F, Doscher C, Coyle PK. A comparison of neuropsychiatric characteristics in chronic fatigue syndrome, multiple sclerosis, and major depression. J Neuropsychiatry Clin Neurosci. 1993;5(2):200–5.
Winningham ML, Nail LM, Burke MB, Brophy L, Cimprich B, Jones LS, Pickard-Holley S, Rhodes V, St Pierre B, Beck S, et al. Fatigue and the cancer experience: the state of the knowledge. Oncol Nurs Forum. 1994;21(1):23–36.
Glaus A. Fatigue--an orphan topic in patients with cancer? Eur J Cancer. 1998;34(11):1649–51.
Vercoulen JH, Swanink CM, Fennis JF, Galama JM, van der Meer JW, Bleijenberg G. Dimensional assessment of chronic fatigue syndrome. J Psychosom Res. 1994;38(5):383–92.
McKenna SP. Measuring patient-reported outcomes: moving beyond misplaced common sense to hard science. BMC Med. 2011;9:86.
Whitehead L. The measurement of fatigue in chronic illness: a systematic review of unidimensional and multidimensional fatigue measures. J Pain Symptom Manage. 2009;37(1):107–28.
Copay AG, Subach BR, Glassman SD, Polly Jr DW, Schuler TC. Understanding the minimum clinically important difference: a review of concepts and methods. Spine J. 2007;7(5):541–6.
Kane RC. The clinical significance of statistical significance. Oncologist. 2008;13(11):1129–33.
Boyce MB, Browne JP, Greenhalgh J. The experiences of professionals with using information from patient-reported outcome measures to improve the quality of healthcare: a systematic review of qualitative research. BMJ Quality Safety. 2014;23(6):508–18.
Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10(4):407–15.
Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol. 2003;56(5):395–407.
Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol. 2008;61(2):102–9.
Hays RD, Farivar SS, Liu H. Approaches and recommendations for estimating minimally important differences for health-related quality of life measures. Copd. 2005;2(1):63–7.
Turner D, Schunemann HJ, Griffith LE, Beaton DE, Griffiths AM, Critch JN, Guyatt GH. The minimal detectable change cannot reliably replace the minimal important difference. J Clin Epidemiol. 2010;63(1):28–36.
Wyrwich KW, Metz SM, Kroenke K, Tierney WM, Babu AN, Wolinsky FD. Triangulating patient and clinician perspectives on clinically important differences in health-related quality of life among patients with heart disease. Health Serv Res. 2007;42(6 Pt 1):2257–74. discussion 2294–2323.
Terwee CB, Roorda LD, Dekker J, Bierma-Zeinstra SM, Peat G, Jordan KP, Croft P, de Vet HC. Mind the MIC: large variation among populations and methods. J Clin Epidemiol. 2010;63(5):524–34.
Wright A, Hannon J, Hegedus EJ, Kavchak AE. Clinimetrics corner: a closer look at the minimal clinically important difference (MCID). J Manual Manipulative Therapy. 2012;20(3):160–6.
Cella D, Hahn EA, Dineen K. Meaningful change in cancer-specific quality of life scores: differences between improvement and worsening. Qual Life Res. 2002;11(3):207–21.
King MT. A point of minimal important difference (MID): a critique of terminology and methods. Expert Rev Pharmacoecon Outcomes Res. 2011;11(2):171–84.
Baro E, Carulla J, Cassinello J, Colomer R, Mata JG, Gascon P, Gasquet JA, Rodriguez CA, Valentin V. Psychometric properties of the Perform Questionnaire: a brief scale for assessing patient perceptions of fatigue in cancer. Support Care Cancer. 2011;19(5):657–66.
Bedard G, Zeng L, Zhang L, Lauzon N, Holden L, Tsao M, Danjoux C, Barnes E, Sahgal A, Poon M. Minimal clinically important differences in the edmonton symptom assessment system in patients with advanced cancer. J Pain Symptom Manage. 2013;46(2):192–200.
Bedard G, Zeng L, Zhang L, Lauzon N, Holden L, Tsao M, Danjoux C, Barnes E, Sahgal A, Poon M, et al. Minimal important differences in the EORTC QLQ-C15-PAL to determine meaningful change in palliative advanced cancer patients. Asia Pac J Clin Oncol. 2013.
Bedard G, Zeng L, Zhang L, Lauzon N, Holden L, Tsao M, Danjoux C, Barnes E, Sahgal A, Poon M, et al. Minimal important differences in the EORTC QLQ-C30 in patients with advanced cancer. Asia Pac J Clin Oncol. 2014;10(2):109–17.
Bjorner JB, Wallenstein GV, Martin MC, Lin P, Blaisdell-Gross B, Tak Piech C, Mody SH. Interpreting score differences in the SF-36 Vitality scale: using clinical conditions and functional outcomes to define the minimally important difference. Curr Med Res Opin. 2007;23(4):731–9.
Borghs S, de la Loge C, Cramer JA. Defining minimally important change in QOLIE-31 scores: Estimates from three placebo-controlled lacosamide trials in patients with partial-onset seizures. Epilepsy Behavior. 2012;23(3):230–4.
Cella D, Eton DT, Lai JS, Peterman AH, Merkel DE. Combining anchor and distribution-based methods to derive minimal clinically important differences on the Functional Assessment of Cancer Therapy (FACT) anemia and fatigue scales. J Pain Symptom Manage. 2002;24(6):547–61.
Cella D, Yount S, Sorensen M, Chartash E, Sengupta N, Grober J. Validation of the Functional Assessment of Chronic Illness Therapy Fatigue Scale relative to other instrumentation in patients with rheumatoid arthritis. J Rheumatol. 2005;32(5):811–9.
Colangelo KJ, Pope JE, Peschken C. The minimally important difference for patient reported outcomes in systemic lupus erythematosus including the HAQ-DI, pain, fatigue, and SF-36. J Rheumatol. 2009;36(10):2231–7.
de Kleijn WP, De Vries J, Wijnen PA, Drent M. Minimal (clinically) important differences for the Fatigue Assessment Scale in sarcoidosis. Respir Med. 2011;105(9):1388–95.
George A, Pope JE. The minimally important difference (MID) for patient-reported outcomes including pain, fatigue, sleep and the health assessment questionnaire disability index (HAQ-DI) in primary Sjogren’s syndrome. Clin Exp Rheumatol. 2011;29(2):248–53.
Goligher EC, Pouchot J, Brant R, Kherani RB, Avina-Zubieta JA, Lacaille D, Lehman AJ, Ensworth S, Kopec J, Esdaile JM, et al. Minimal clinically important difference for 7 measures of fatigue in patients with systemic lupus erythematosus. J Rheumatol. 2008;35(4):635–42.
Khanna D, Pope JE, Khanna PP, Maloney M, Samedi N, Norrie D, Ouimet G, Hays RD. The minimally important difference for the fatigue visual analog scale in patients with rheumatoid arthritis followed in an academic clinical practice. J Rheumatol. 2008;35(12):2339–43.
Kosinski M, Zhao SZ, Dedhiya S, Osterhaus JT, Ware Jr JE. Determining minimally important changes in generic and disease-specific health-related quality of life questionnaires in clinical trials of rheumatoid arthritis. Arthritis Rheum. 2000;43(7):1478–87.
Kvam AK, Wisløff F, Fayers PM. Minimal important differences and response shift in health-related quality of life; a longitudinal study in patients with multiple myeloma. In: Health and Quality of Life Outcomes, vol. 8. 2010. p. 79.
Kwok T, Pope JE. Minimally important difference for patient-reported outcomes in psoriatic arthritis: Health Assessment Questionnaire and pain, fatigue, and global visual analog scales. J Rheumatol. 2010;37(5):1024–8.
Lai JS, Beaumont JL, Ogale S, Brunetta P, Cella D. Validation of the functional assessment of chronic illness therapy-fatigue scale in patients with moderately to severely active systemic lupus erythematosus, participating in a clinical trial. J Rheumatol. 2011;38(4):672–9.
Lasch K, Joish VN, Zhu Y, Rosa K, Qiu C, Crawford B. Validation of the sleep impact scale in patients with major depressive disorder and insomnia. In: Current medical research and opinion, vol. 25. 2009. p. 1699–710.
Maringwa J, Quinten C, King M, Ringash J, Osoba D, Coens C, Martinelli F, Reeve BB, Gotay C, Greimel E, et al. Minimal clinically meaningful differences for the EORTC QLQ-C30 and EORTC QLQ-BN20 scales in brain cancer patients. Ann Oncol. 2011;22(9):2107–12.
Maringwa JT, Quinten C, King M, Ringash J, Osoba D, Coens C, Martinelli F, Vercauteren J, Cleeland CS, Flechtner H, et al. Minimal important differences for interpreting health-related quality of life scores from the EORTC QLQ-C30 in lung cancer patients participating in randomized controlled trials. Support Care Cancer. 2011;19(11):1753–60.
Mathias SD, Gao SK, Rutstein M, Snyder CF, Wu AW, Cella D. Evaluating clinically meaningful change on the ITP-PAQ: preliminary estimates of minimal important differences. In: Current medical research and opinion, vol. 25. 2009. p. 375–83.
Matza LS, Wyrwich KW, Phillips GA, Murray LT, Malley KG, Revicki DA. The Fatigue Associated with Depression Questionnaire (FAsD): Responsiveness and responder definition. Qual Life Res. 2013;22(2):351–60.
Mills RJ, Calabresi M, Tennant A, Young CA. Perceived changes and minimum clinically important difference of the Neurological Fatigue Index for multiple sclerosis (NFI-MS). Mult Scler. 2013;19(4):502–5.
Patrick DL, Gagnon DD, Zagari MJ, Mathijs R, Sweetenham J. Assessing the clinical significance of health-related quality of life (HrQOL) improvements in anaemic cancer patients receiving epoetin alfa. Eur J Cancer. 2003;39(3):335–45.
Pouchot J, Kherani RB, Brant R, Lacaille D, Lehman AJ, Ensworth S, Kopec J, Esdaile JM, Liang MH. Determination of the minimal clinically important difference for seven fatigue measures in rheumatoid arthritis. J Clin Epidemiol. 2008;61(7):705–13.
Purcell A, Fleming J, Bennett S, Burmeister B, Haines T. Determining the minimal clinically important difference criteria for the Multidimensional Fatigue Inventory in a radiotherapy population. Support Care Cancer. 2010;18(3):307–15.
Reddy S, Bruera E, Pace E, Zhang K, Reyes-Gibby CC. Clinically important improvement in the intensity of fatigue in patients with advanced cancer. J Palliat Med. 2007;10(5):1068–75.
Rendas-Baum R, Yang M, Cattelin F, Wallenstein GV, Fisk JD. A novel approach to estimate the minimally important difference for the Fatigue Impact Scale in multiple sclerosis patients. Qual Life Res. 2010;19(9):1349–58.
Robinson Jr D, Zhao N, Gathany T, Kim LL, Cella D, Revicki D. Health perceptions and clinical characteristics of relapsing-remitting multiple sclerosis patients: baseline data from an international clinical trial. Curr Med Res Opin. 2009;25(5):1121–30.
Schwartz AL, Meek PM, Nail LM, Fargo J, Lundquist M, Donofrio M, Grainger M, Throckmorton T, Mateo M. Measurement of fatigue. determining minimally important clinical differences. J Clin Epidemiol. 2002;55(3):239–44.
Schünemann HJ, Puhan M, Goldstein R, Jaeschke R, Guyatt GH. Measurement properties and interpretability of the Chronic Respiratory disease Questionnaire (CRQ). COPD: J Chron Obstruct Pulmon Dis. 2005;2(1):81–9.
Sekhon S, Pope J, Baron M. The minimally important difference in clinical practice for patient-centered outcomes including health assessment questionnaire, fatigue, pain, sleep, global visual analog scale, and SF-36 in scleroderma. J Rheumatol. 2010;37(3):591–8.
Spiegel BM, Younossi ZM, Hays RD, Revicki D, Robbins S, Kanwal F. Impact of hepatitis C on health related quality of life: a systematic review and quantitative assessment. Hepatology. 2005;41(4):790–800.
Twiss J, Doward LC, McKenna SP, Eckert B. Interpreting scores on multiple sclerosis-specific patient reported outcome measures (the PRIMUS and U-FIS). Health Qual Life Outcomes. 2010;8:117.
Ward MM, Guthrie LC, Alba M. Domain-specific transition questions demonstrated higher validity than global transition questions as anchors for clinically important improvement. J Clin Epidemiol. 2015.
Wells G, Li T, Maxwell L, MacLean R, Tugwell P. Determining the minimal clinically important differences in activity, fatigue, and sleep quality in patients with rheumatoid arthritis. J Rheumatol. 2007;34(2):280–9.
Wheaton L, Pope J. The minimally important difference for patient-reported outcomes in spondyloarthropathies including pain, fatigue, sleep, and Health Assessment Questionnaire. J Rheumatol. 2010;37(4):816–22.
Wyrwich KW, Fihn SD, Tierney WM, Kroenke K, Babu AN, Wolinsky FD. Clinically important changes in health-related quality of life for patients with chronic obstructive pulmonary disease: an expert consensus panel report. J Gen Intern Med. 2003;18(3):196–202.
Wyrwich KW, Spertus JA, Kroenke K, Tierney WM, Babu AN, Wolinsky FD. Clinically important differences in health status for patients with heart disease: an expert consensus panel report. Am Heart J. 2004;147(4):615–22.
Yost KJ, Eton DT, Garcia SF, Cella D. Minimally important differences were estimated for six Patient-Reported Outcomes Measurement Information System-Cancer scales in advanced-stage cancer patients. J Clin Epidemiol. 2011;64(5):507–16.
Zeng L, Chow E, Zhang L, Tseng LM, Hou MF, Fairchild A, Vassiliou V, Jesus-Garcia R, El-Din MAA, Kumar A, et al. An international prospective study establishing minimal clinically important differences in the EORTC QLQ-BM22 and QLQ-C30 in cancer patients with bone metastases. Support Care Cancer. 2012;20(12):3307–13.
Revicki DA, Erickson PA, Sloan JA, Dueck A, Guess H, Santanello NC. Interpreting and reporting results based on patient-reported outcomes. Value Health. 2007;10 Suppl 2:S116–124.
Crosby RD, Kolotkin RL, Williams GR. An integrated method to determine meaningful changes in health-related quality of life. J Clin Epidemiol. 2004;57(11):1153–60.
Beaton DE, Boers M, Wells GA. Many faces of the minimal clinically important difference (MCID): a literature review and directions for future research. Curr Opin Rheumatol. 2002;14(2):109–14.
Niebauer K, Dewilde S, Fox-Rushby J, Revicki DA. Impact of omalizumab on quality-of-life outcomes in patients with moderate-to-severe allergic asthma. Annals Allergy Asthma Immunology. 2006;96(2):316–26.
This study was funded by the University of Gothenburg Centre for Person-Centred Care (GPCC).
Availability of data and materials
ÅN, CT and ÅLN planned the study. ÅN and AD performed the searches, screened all research hits and extracted the data. All authors contributed to the manuscript writing. AD made the tables. All authors agreed on the final version of the manuscript.
The authors declare that they have no competing interests.
Consent for publication