Skip to main content

The minimal important difference of patient-reported outcome measures related to female urinary incontinence: a systematic review



The minimal important difference is a valuable metric in ascertaining the clinical relevance of a treatment, offering valuable guidance in patient management. There is a lack of available evidence concerning this metric in the context of outcomes related to female urinary incontinence, which might negatively impact clinical decision-making.


To summarize the minimal important difference of patient-reported outcome measures associated with urinary incontinence, calculated according to both distribution- and anchor-based methods.


This is a systematic review conducted according to the PRISMA guidelines. The search strategy including the main terms for urinary incontinence and minimal important difference were used in five different databases (Medline, Embase, CINAHL, Web of Science, and Scopus) in 09 June 2021 and were updated in January 09, 2024 with no limits for date, language or publication status. Studies that provided minimal important difference (distribution- or anchor-based methods) for patient-reported outcome measures related to female urinary incontinence outcomes were included. The study selection and data extraction were performed independently by two different researchers. Only studies that reported the minimal important difference according to anchor-based methods were assessed by credibility and certainty of the evidence. When possible, absolute minimal important differences were calculated for each study separately according to the mean change of the group of participants that slightly improved.


Twelve studies were included. Thirteen questionnaires with their respective minimal important differences reported according to distribution (effect size, standard error of measurement, standardized response mean) and anchor-based methods were found. Most of the measures for anchor methods did not consider the smallest difference identified by the participants to calculate the minimal important difference. All reports related to anchor-based methods presented low credibility and very low certainty of the evidence. We pooled 20 different estimates of minimal important differences using data from primary studies, considering different anchors and questionnaires.


There is a high variability around the minimal important difference related to patient-reported outcome measures for urinary incontinence outcomes according to the method of analysis, questionnaires, and anchors used, however, the credibility and certainty of the evidence to support these is still limited.

Peer Review reports


The International Continence Society defines urinary incontinence as any loss of urine [1]. Stress urinary incontinence has been defined as urine loss associated with coughing, sneezing, exertion, or physical exertion; while urgent urinary incontinence is defined as loss of urine associated with urinary urgency (a sudden and strong urge to urinate) and mixed urinary incontinence combines both stress and urge incontinence, concomitantly [1].

According to the World Health Organization, urinary incontinence affects more than 200 million people worldwide [2, 3] being more prevalent in women [4]. One in four women will be incontinent at some point in life [4, 5]. The high prevalence of urinary incontinence concerns government institutions, as the costs related to urinary incontinence care are high, varying from around 117 million and $66 billion (2007 US dollars) per year in the United Kingdom [6] and the United States of America [7], respectively. The consequences of urinary incontinence are associated with impairment of social, psychological, financial, and sexual aspects of a woman’s life. This in turn can be related to reduced quality of life [8], self-esteem, and social isolation [9]. Moreover, urinary incontinence is a predictor of mortality, especially among the elderly [10].

Patient-reported outcome measures and voiding diaries are used to measure the quality of life of patients with urinary incontinence, as well as to quantify urinary loss. In both clinical practice and research, patient-reported outcome measures are useful for reporting the effects of interventions since they take into consideration the patients’ perspective regarding the changes observed after the treatment. However, the interpretation of scientific research results in general looks mainly at the interpretation of statistical analyses, that is, whether the result of any intervention may or may not be considered statistically significant [11]. The sole interpretation of the “p” values is insufficient to demonstrate the impact of the intervention on the health care of individuals [12, 13], as sometimes the research findings may be statistically significant but cannot be considered clinically relevant, as the patient did not have a clinically significant improvement [14].

The analysis of clinical significance has increasingly been used in health research, enabling it to attest to whether the result from a treatment is perceived as beneficial by the patient or any stakeholder’s perspective [15]. One of the methods used to help with the interpretation of the clinical relevance of research results is the use of the minimal important difference of clinical outcome measures. The minimal important difference has been defined as “the smallest difference in score in the domain of interest that patients perceive as important, either beneficial or harmful, and which would lead the clinician to consider a change in the patient’s management’’ [16].

There are two different methods to determine the minimal important difference: [17] (1) Distribution methods use statistical calculations based on the distribution of outcomes scores to determine how the scores differ among patients [18]. Although these methods are easily applied, they do not evaluate the clinical relevance of the intervention according to the patient's perception [16]. (2) Anchor-based methods take into consideration patients’ perceptions by using interpretive and self-reported tools such as the global rating of change scale [19,20,21,22] for assessing change in the outcome, which represents a meaningful degree of change [23]. In this case, the patient has the autonomy to add a numerical value to the status of the main complaint, considering their perception. Psychosocial factors, for example, could potentially influence the patient's global status, which may interfere with the variable of interest [16].

Previous systematic reviews have assessed the minimal important difference for outcomes related to the musculoskeletal [24,25,26] and oncological [27] areas but none of them have focused on evaluating minimal important difference for outcomes related to urinary incontinence, which has a negative impact on this research field, as it impairs the estimation of sample sizes and the interpretation of the results of clinical trials. This lack in the literature may directly affect the over- or underestimation of the clinical significance of studies that have already been published or will be in the future. In addition, the lack of clear guidance on how to interpret the clinical relevance of results from urinary incontinence outcomes does not contribute to evidence-based practice [28]. Synthesizing the evidence about the clinical relevance of instruments related to urinary incontinence may benefit clinicians and researchers, [29] improving decision-making, by informing the minimal important difference of specific instruments, which may be listed in clinical and scientific practice [30].

Therefore, the aims of the present systematic review were: I) to identify and synthesize all distribution-based and anchor-based methods to estimate minimal important difference for outcome measures related to urinary incontinence; II) to summarize minimal important difference estimates related to the most commonly used outcome measures related to urinary incontinence; III) to determine the credibility of minimal important difference reported in each study.


This is a systematic review conducted according to the PRISMA [31] and COnsensus-based Standards for the selection of health Measurement INstruments [32] guidelines and registered in PROSPERO (protocol CRD42022299686).

Eligibility criteria, information sources, search strategy

The inclusion and exclusion criteria were based and adapted according to the PICOs and COSMIN frameworks, as described below:

Population: Women older than 18 years old, with stress, urge and/or mixed urinary incontinence according to International Continence Society definitions(1); with diagnostic of urinary incontinence according to the results of a subjective or objective assessment. Studies were excluded if the aim was to analyze urinary symptoms of children or men; if they included only continent women and/or if authors analyzed only other pelvic floor dysfunctions (i.e., fecal and/or anal incontinence, pelvic organ prolapse, sexual dysfunctions).

Intervention/Instruments of interest (construct targeted): Studies were included if they assessed any outcome measure related to urinary incontinence, such as quality of life and/or amount of leakage. We also looked for outcomes that assessed pelvic floor muscles function evaluated through by questionnaires or physical tests that include vaginal palpation, dynamometry, vaginal cones, manometry, electromyography, imaging exams, urodynamic and/or urine stream interruption test [33]. However, no studies were found during screening.

Comparison: Not applicable.

Outcomes: Studies that reported minimal important differences that could be derived from distribution- or anchor-based methods as described in a previous study [17] were included. A detailed description of the methods available to determine minimal important difference in clinical research are presented in Appendix 1.

Study design: Any study generating minimal important differences for urinary incontinence outcomes (randomized control trials and controlled trials, secondary analysis of clinical trials, cohort studies, cross-sectional studies, reliability, responsiveness, and validity studies) were included. The following types of studies were excluded: case reports, reviews, systematic reviews, meta-analyses, commentaries, letters to the editor, conference papers, books chapter, protocol registration, abstracts without full text, and experimental studies. Reviews were carefully looked for relevant references.

Searches were performed in June 09 2021 and updated in January 09 2024, including the main terms for urinary incontinence and minimal important difference. In addition, a search filter focusing on clinical significance keywords obtained from previous publications was used [34] (details available in Appendix 2). Five databases were consulted: Medline (Ovid MEDLINE(R) ALL), Embase (Ovid interface), CINAHL PLUS with Full text (EBSSCOhost interface), Web of Science (Indexes=SCI-EXPANDED, SSCI, A&HCI, ESCI) and Scopus. No limits were applied for the date, language, or publication status. A manual search was performed to look for relevant references. Included studies were tracked with the web of Sciences database.

Study selection

Results from searchers were compiled into ENDNOTE software and imported to Covidence (, which was used during the screening process. Two independent researchers evaluated the studies' eligibility according to the inclusion and exclusion criteria in two sequential evaluation phases: (I) analysis of titles and abstracts; and (II) analysis of full texts. In case of disagreement, a consensus meeting was performed. In any case of continuous discrepancy, a third evaluator makes the final decision. The PRISMA flowchart [35] was provided with the results of the selection process.

Data extraction

An Excel form was developed for data extraction. Pilot testing and regular revision through discussions were taken to standardize the data extraction form and process. One researcher conducted the data extraction and organized the data on the Excel form and a second researcher reviewed the extracted data for accuracy and completeness. Disagreements were solved in consensus meetings.

Data extracted was based on characteristics that include, but were not limited to: 1) article information (first author, year of publication, language, funding, country, aims, study design, and setting); 2) population information (age, diagnosis, tool for the diagnosis and other conditions or characteristics); 3) outcome measurements (minimal important difference determination (e.g. analytical approach, sample size, duration of follow-up when applicable); minimal important difference estimation methods (distribution- and/or anchor-based; the specific anchor applied during data collection, minimal important difference values); constructs evaluated (e.g. quality of life evaluated according to patient-reported outcome measures, pelvic floor function, urinary loss); tool description (categorical, ordinal, or numerical data); type of outcome (patient-reported outcome measures or physical test)); 4) summary of results (minimal important difference estimation, correlations between the outcome and anchor, precision of the minimal important difference (e.g. 95% confidence interval/ minimal important difference *100), time between baseline and follow-up, directions of both anchor and patient-reported outcome measures (e.g., if the increase of scores of both instruments reflect an improvement, worsened, or if the scores from both instruments have opposite meaning), correlations of the patient-reported outcome measures and the transition item during baseline and follow-up). In case of missing quantitative data, the authors of the primary studies were contacted in order to get unreported data. When the authors did not answer our request, data were extracted from the graphs available in the studies.

Credibility of minimal important difference estimates

Two independent researchers conducted the credibility assessment of the minimal important difference in each included study that used anchor-based methods. As far as the authors' knowledge, there is no specific tool to assess the credibility of minimal important differences reported according to distribution-based methods. The credibility was evaluated separately for each minimal important difference by two assessors and the final assessment was determined after a consensus meeting between the two reviewers. The instrument developed by Devji et al. [34] for this specific purpose was used under license authorization from McMaster University, as it is the only published tool created for evaluating the credibility of the minimal important difference generated by anchor-based methods. It is composed of 1) a core criterion with five items related to anchor-based methods, and 2) four items related to the transition rating anchors. The first item has a dichotomic yes/no response option, however, the other items from the instrument are composed by a five-point scale with the following response options: definitely yes, to a great extent, not so much, definitely no, or impossible to tell.

There is no specific guidance on how to summarize different domains of this tool as a final assessment of the credibility of the minimal important difference. Therefore, the final assessment for each minimal important difference was defined according to previous decision rules prepared by the team, to create three different categories of credibility: these were based on similar decision rules used when implementing the Cochrane risk of bias (RoB2) tool for randomized controlled trials. Three different categories were created to determine the final assessment of minimal important difference credibility as follows:

  1. 1)

    Low credibility: when most part or one of the items was scored with a negative answer (i.e., not so much or definitely no);

  2. 2)

    Some concerns: when no negative answers were assessed, and the rest of the questions were assessed as “impossible to tell”;

  3. 3)

    High credibility: when all the questions were assessed with a positive answer (i.e., to a great extent or definitely yes).

Data synthesis

The findings of this review were described in a narrative (descriptive) synthesis, organized in evidence tables that compiled study details, results, and data analysis. Data synthesis was performed according to the patient-reported outcome measures reported by the authors and the method of calculation for providing the minimal important difference. Minimal important difference provided by distribution-based methods were analyzed separately according to the type of calculation (i.e., effect size, standardized response mean, standard error of measurement, standard deviation) and time range of re-evaluation (e.g., 6 weeks, 12 weeks, 12 months). minimal important difference provided by anchor-based methods were performed following guidance from a previous systematic review about minimal important difference [26]. The absolute minimal important difference (mean difference associated with minimum improvement) was calculated for each study separately by checking the original papers and by extracting the mean change of the group of participants that reported a slight improvement, according to the anchor applied during data collection.

After data synthesis, we planned to plot all minimal important difference estimates based on anchor methods together by triangulation, in order to define a single value for each instrument included in the present review, considering that we would find evidence from multiple studies. However, the primary studies presented a high heterogeneity considering patient-reported outcome measures, anchors, and population characteristics, which violated the recommendations to perform the triangulation [36]. Also, a meta-analysis was not possible to conduct because of insufficient data.

Quality of evidence

The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) [37] approach was applied in order to assess the overall certainty of the evidence and to grade the strength of recommendations from minimal important differences reported according to anchor-based methods. This assessment was based on the credibility of the minimal important difference (that was analog to the risk bias of studies), inconsistency, indirectness, imprecision, and publication bias. We reported GRADE following previous recommendations on how to rate the certainly of evidence in the absence of pooled results and meta-analysis [38].

The level of evidence was downgraded for inconsistency and/or indirectness in cases where: minimal important differences from patient-reported outcome measures were reported by a single study; different anchors were applied in order to calculate the minimal important difference, studies included different population diagnoses or time-points when the minimal important differences were calculated; studies used different levels of improvement to determine the minimal important difference (minimal, moderate, or strong) when conducting their analysis. The imprecision was downgraded when the total sample size population was less than 300 participants.

The final rating of the studies was classified as high, moderate, low, or very low certainty of evidence [37].


Study selection

A total of 1,662 papers were found through the database search, 719 references were duplicated, so the final number of studies included in the data screening was 943. According to the screening of titles and abstracts, 54 potential studies were selected for full-text review and 10 studies met the inclusion criteria [39,40,41,42,43,44,45,46,47,48]. Reasons for exclusion are available in the PRISMA flowchart (Fig. 1) and details of exclusions are provided in Appendix 3. After the manual search, two additional studies were included [49, 50]. Therefore, 12 studies were analyzed.

Fig. 1
figure 1

PRISMA flowchart

Characteristics of included studies

The general information of the 12 studies included in the study is described in Table 1. Most of the studies were conducted in the United States of America [39,40,41,42, 44, 46], and published after 2010 [42,43,44,45,46,47,48,49,50], and minimal important differences were derived mainly from data of randomized controlled trials [39,40,41,42, 46, 48, 50], related to non-surgical [39,40,41,42, 45, 48, 50] and surgical [43, 44, 46, 47, 49, 50] interventions. One study conducted as a secondary analysis from two different trials assessed the surgical and conservative effectiveness of UI interventions [50]. Nine studies included participants with stress urinary incontinence [40, 41, 43,44,45,46, 48,49,50], one study included participants with urgency stress urinary incontinence [42] and three included women with mixed stress urinary incontinence [39, 47, 50]. The diagnosis of the participants’ symptoms was assessed by subjective (i.e., self-reported, validated questionnaires, health professionals interviews) and objective tools and tests, specially by urodynamics. Eight studies reported minimal important differences according to distribution-based methods [41,42,43,44, 46,47,48,49], while 10 studies reported minimal important difference according to anchor-based methods [39,40,41,42,43, 45, 46, 48,49,50].

Table 1 General information of included studies (n=12)

Analysis of credibility

Ten studies [39,40,41,42,43, 45, 46, 48,49,50] determined minimal important differences of several patient-reported outcome measures using anchor-based methods and provided 78 different minimal important differences. Therefore, we performed one evaluation for each minimal important difference separately, resulting in 78 credibility assessments. All reports related to minimal important differences according to anchor-based methods presented low credibility. More details about the scores of the credibility tool are reported in Appendix 4.

In most cases (n=78), the studies met the first criterion of the tool, that assesses if participants responded to the patient-reported outcome measures and the anchor directly. Moreover, anchors used during data collection were considered understandable (second criteria) in 75 cases.

In 24 derived minimal important difference calculations, the correlation between the patient-reported outcome measures and the anchor was not reported (third criteria), although most authors mentioned a general correlation of ≥0.3 between the instruments (n=52). Similarly, most authors failed to meet the fourth criteria of the tool that measured the precision estimate of the minimal important difference (n=61; 78.2%). In 42 cases, the criterion applied by the anchor did not reflect a small but important difference between the health status of the patients, which contradicts the definition of the minimal important difference.

For 63 minimal important difference estimates, the range of time between the first and the second assessments was considered long (more than two or three months); which is the sixth criteria. This can likely be linked to recall bias (i.e., biased perception of the actual health(34)) and difficulty in assessing the previous health status [34]. The correlation between the transition score and the prescore and postscore on the target instrument (seventh and eighth criteria) was reported only in few estimates in three different studies [42, 43, 46].

The risk of bias graph and the summary results are presented in Appendix 5 and 6, respectively.

Synthesis of results

All minimal important difference estimates were provided for 13 different patient-reported outcome measures. Although we targeted several types of outcomes in this review, no study reported minimal important difference estimates for physical assessment of pelvic floor muscles’ function, for example. Some authors also provided the minimally important difference for subscales of patient-reported outcome measures. This was the case for the Incontinence Quality of Life (I-QOL): Avoidance and Limiting Behavior, Psychosocial Impacts and Social Embarrassment domains [40]; Pelvic Floor Impact Questionnaire (PFIQ) – UIQ subscale; Pelvic Floor Distress Inventory (PFDI) – general score for UDI [43], and stress and irritative subscales [41]; Overactive Bladder Questionnaire (OAB-q) – Symptom Severity subscore [42]; the Australian Pelvic Floor Questionnaire – Bladder and global score [49]; and the International Consultation on Incontinence Questionnaire – Female Lower Urinary Tract Symptoms (ICIQ-FLUTS) – incontinence domain [50].

Ten different subjective and objective anchors were found among the studies. The Patient Global Impression of Improvement also known as the Global Rating Scale was the most used, followed by the voiding diary, satisfaction with the treatment, and the pad test.

Table 2 describes the main details regarding the population, the patient-reported outcome measures, anchors, data analysis, and conclusions reported by the included studies. Although one study reported minimal important differences according to anchor methods for the Michigan Incontinence Symptom Index (M-ISI) [44], results were not considered in the present review because the statistical method applied by the authors was not clear in the manuscript, and the authors did not respond our e-mail. Appendix 7 provides details about the methods and concepts used to provide minimal important differences using anchor-based methods. Appendix 8 presents a matrix table with a compilation of the minimal important differences extracted from the primary studies according to the distribution and anchor-based methods.

Table 2 Characteristics of primary studies included in this systematic review

Tables 3 and 4 provide the qualitative data extracted from the studies that reported minimal important differences according to distribution- and anchor-based methods, respectively. Minimal important difference estimates for distribution-based methods represent the “points” for each patient-reported outcome measure. Three main distribution-based analyses were used by the included studies: effect size, standardized response mean, and standard error of measurement. For minimal important difference reported according to anchor method, it was reported by different estimates, including the mean, standard deviation, and absolute value, followed by the 95% confidence intervals and minimum-maximum values for the specific patient-reported outcome measures. Time points (follow-up) were different between studies (6, 10, 12, 14 weeks; and 4, 8, 12 and 12 months). In addition, there was a lack of clarity regarding the time point in four primary studies [42, 44, 46, 47]. Table 4 also shows the level of improvement considered by the authors when calculating the minimally important differences by anchor-based methods according to different symbols. Although different patient-reported outcome measures and anchors were applied, most of the studies did not consider the smallest difference identified by the participants to calculate the minimal important difference. The most used level to generate the minimal important difference was moderate to strong improvement.

Table 3 Quantitative results from the studies included in the present systematic review, according to distribution-based methods.
Table 4 Quantitative results from the studies included in the present systematic review, according to anchor-based methods

Figure 2 provides the minimal important difference estimates ranging from 0 to 10 points in their respective patient-reported outcome measures from included studies, considering the score of the patient-reported outcome measures related to the smallest improvement of UI. Figure 3 presents minimal important differences which had a higher range of scores in the patient-reported outcome measures (-150 to +150).

Fig. 2
figure 2

MIDs estimations and 95%CI considering the slight improvement reported by the authors, for MIDs ranging from 0 to 10 points in their respective PROMS. CI: confidence interval; ICIQ-SF: International Consultation on Incontinence Questionnaire - Short Form; I-QOL: Incontinence Quality of Life; MID: minimal important difference; PGI-I: Patient Global Impression of Improvement questionnaire

Fig. 3
figure 3

MIDs estimations and 95%CI considering a slight improvement reported by the authors, for MIDs ranging from -150 to +150 points in their respective PROMS. CI: confidence interval; MID: minimal important difference; PFDI: Pelvic Floor Distress Inventory; PFIQ: Pelvic Floor Impact questionnaire; PGI-I: Patient Global Impression of Improvement questionnaire; UDI: Urogenital Distress Inventory; UIQ: Urinary Impact Questionnaire; VAS: visual analogue scale

Certainty of evidence

All the minimal important differences reported by anchor-based methods were considered with very low quality of evidence. For more details about GRADE, please check Appendix 9.

All studies [39,40,41,42,43, 45, 46, 48,49,50] presented very serious concerns about the risk of bias, which means that they presented low credibility in calculating and reporting the minimal important difference according to anchor-based methods. There was also serious and very serious inconsistency in the studies.

We downgraded the quality/certainty of the evidence for inconsistency (ICIQ-SF [45, 46, 48], ICIQ-LUTSqol [45, 48], UDI [41, 42]) and indirectness of studies that did not include in their analysis only the population with minimal improvement in their criteria (according to the minimal important difference definition and main question of the present review). Considering this last criterion, three patient-reported outcome measures presented “not serious” indirectness (Australian Pelvic Floor Questionnaire [49], IQOL-Subscores [40], UIQ [41, 43]), while four studies showed “serious” indirectness (UDI [41, 42], UDI-Irritative subscale [42], UDI-Stress subscale [41], OAB-q [42]) and three studies showed “very serious” indirectness (IQOL-Total score [39, 40], ICIQ-SF [45, 46, 48], ICIQ-LUTSqol [45, 48]).

Most parts of the outcomes included a sample size >300, although two patient-reported outcome measures were considered with a serious imprecision (UD/I-Irritative scale [42], OAB-q [42]), while one outcome was considered to have a very serious imprecision (Australian Pelvic Floor Questionnaire [49]).

Publication bias was not considered for this systematic review since the search process was comprehensive and exhaustive.


We included 12 studies that reported minimal important differences in outcome measures used when managing female urinary incontinence, with high variability in methods and values. The minimal important differences from thirteen different patient-reported outcome measures were reported, most of time according to anchor-based methods, using ten different anchors. However, all studies with anchor-based methods presented a low credibility and very low overall certainty. Also, minimally important differences values seem to change according to the time points that are used to generate the minimally important differences (i.e., follow-up of 4 or 6 weeks, 12 and 24 months), the characteristics of the population (i.e., type of urinary incontinence) and different anchors used.

Similar to a previous review [51], minimal important differences provided by distribution based-methods were smaller than the ones provided by anchor based-methods, which could possibly suggest that a smaller change is necessary to represent a clinically significant difference [52]. It is known that distribution based-methods only consider the distribution of the scores on their calculations and they are usually related to the variation/change that was observed in a standardized way around the mean. For this reason, previous literature suggested that anchor-based methods should be preferred over distribution-based methods [17].

A possible explanation for the wide variability around these minimal important differences may be related to the level of improvement of patients considered during data analysis. Although some authors already hypothesized that there is neither consensus nor evidence about what is the best criteria to determine the minimal important difference using anchor based-methods [17, 53], it should be pointed out that calculations that include groups of participants who considered themselves to have improved moderately or greatly after an intervention could lead to different minimal important differences estimations and it does not follow the original concept of minimal important difference that includes the “smallest difference” in scores that the individuals consider to be beneficial [54]. In the present systematic review, the majority of studies did not consider the smallest change of improvement (as perceived by the patients) in their calculations, so future studies could be biased if they consider these values in the estimation of their sample size, or even on interpreting their results. Halme et al. [55] published a study that compiled estimations for calculating sample sizes of trials to treat female urinary incontinence according to minimal important differences. In their statistical analysis, the authors included participants that reported a “very much better” improvement after treatment, which does not represent the smallest difference perceived by the patient.

Previous studies [26, 53] recognized the need of validating studies for anchors that are commonly used for data collection about the perception of patients regarding a treatment. Furthermore, there is a need for standardizing the procedures to assess important changes for the patient, by establishing a valid and specific question for that. The lack of validation a standardizing implies a variability in the results, due to the application of different anchors to calculate minimal important differences [53], generating inconsistency between studies that assess minimal important differences.

The literature suggests that anchors should be selected based on it´s relevance and should lay proximal to the construct assessed by the patient-reported outcome measures, which is usually analyzed by the correlation between the tools (anchor and patient-reported outcome measures). Also, researchers and clinicals should consider the characteristics of the sample and severity of the disease in order to define the adequate anchor. In addition, this rationale should be based on previous guidance and scientific evidence [29]. A previous study also found that derived minimal important differences are highly variable due to the discrepancy in study designs, methods, and concepts used when calculating the minimal important differences [26]. These results agree with the present review.

The newly developed tool used to assess the credibility of the derived minimal important differences according to anchor-based methods showed that the studies presented low credibility. Most studies did not report a pre-requisite of minimal important differences calculation, which is the correlation between the patient-reported outcome measures and the anchor. In addition, only three studies [42, 43, 46] reported the correlations between anchors and patient-reported outcome measure scores during follow-up. This missing information could also help to explain the variability found from the minimal important difference values [53]. Considering that anchor and patient-reported outcome measures should be measured in the same or similar underlying constructs, correlations between tools show that both tools are closely linked. Therefore, anchors with absence or low correlation will provide inaccurate minimal important difference estimates [34].

Attention should be drawn to methodological issues related to the calculations and reports of minimally important differences while interpreting the results reported by the literature. It is important to evaluate the credibility of minimal important difference since there is a substantial misunderstanding of methods and concepts that can lead to incorrect reporting of minimal important difference values. Authors should follow some guidance while conducting studies with this aim. This information could be found in previous studies [17] and also by interpreting and incorporating the items assessed by the credibility tool [34] in future studies.

This review contributes substantially to Women’s Health research. A summary of the minimal important differences for outcomes related to urinary symptoms in the literature may contribute to evidence-based practice, by complementing statistical results with clinicians’ clinical experience and patients’ perception of a treatment [17, 28]. It may result in a new direction for the treatment of urinary symptoms since it brings a focus to interventions that are clinically relevant and can be successfully implemented in clinical practice. Moreover, a new interpretation of results from the literature may be incorporated, as we bring to focus the estimates that might be used to classify results from studies as clinically relevant, not only with statistical power. It may highlight in previous studies that an over- or underestimation could possibly have occurred in the past by interpreting only results from statistical analysis. In addition, our results could facilitate the design and planning of future studies such as generating accurate sample size calculations, determining best outcome measures, and therefore, facilitating the future update of clinical research into practice. Therefore, researchers are encouraged to incorporate these outcomes in their clinical studies to measure the effectiveness of interventions, taking into consideration not only statistical significance but also clinical relevance.

This systematic review followed a rigorously methodological sequence which included the preparation and registration of a protocol for the review, and a systematic search of the most important databases. The eligibility, data extraction, and credibility of the studies were performed by two independent researchers. Moreover, the present review only included studies that reported minimal important differences according to analysis that are already recommended by previous guidelines. We reported which tools already have a minimal important difference that is available to be used in clinical research. In addition, we synthesized the steps and information that are necessary to calculate and analyze the minimal important difference, besides the guidance to help researchers to interpret it correctly. Furthermore, some limitations and misconceptions related to minimal important differences raised from the results of the present review were emphasized.

The present systematic review has some limitations. The limited number of studies included did not allow us to perform sub-analysis according to the type of urinary incontinence, methods of calculation (i.e., distribution or anchor-based method), and/or anchors used during data analysis. Moreover, it was not possible to assess the credibility of studies that reported minimal important differences according to distribution-based methods, as the tool described by Devji et al. [34] was developed to evaluate studies that reported minimal important differences by anchor-based methods (which is the most accepted method to generate minimal important differences). In addition, although guidance exists on how to apply the tool, some clarity was needed on some specific points, especially when deriving a final assessment. Authors from the present review agreed on decision rules to assess the credibility of the minimally important differences derived in the analyzed studies. These decision rules might be considered arbitrary; however, they were based on similar decision rules done in the context of RoB assessment of RCTs.

Although we provide minimal important differences derived by anchor based-methods according to the smallest improvement based on the mean change, our analysis was restricted to the availability of data reported by the studies, such as the scores of patient-reported outcome measures of the group of patients who considered themselves “a little better”. In cases where data was not available, the calculation was not possible, which limited the information reported in our review.

We planned to triangulate minimal important differences derived from the same patient-reported outcome measures, considering the method of calculation (i.e., distribution or anchor based-method) and/or anchors used during data analysis. However, regarding the variability among the studies, it was not possible to calculate one single value of minimal important difference for each patient-reported outcome measure. This is a common limitation among systematic reviews that try to compile minimal important differences available for different patient-reported outcome measures [26, 56]. Previous reports39,58,64,6 concluded that minimal important differences could not be interpreted as a constant characteristic and a universally empirical score could not be derived. Instead, it is recommended that minimal important difference is analyzed and considered according to the severity of the condition during the baseline, the type of treatment, the units of the patient-reported outcome measures, the conditions of the population, and the context where the patient is located [29, 51, 56, 57]. In addition, it seems that minimal important differences can also change according to the different characteristics of the population [53]. That was also the case in the present study, as it was also possible to notice that minimal important differences from a population with urgency urinary incontinence [42] were different for the same patient-reported outcome measures in a sample with stress urinary incontinence [41]. Therefore, authors should be aware to include these characteristics in their reports about minimal important differences.

Moreover, our study did not explore the factors that could lead to the variability among minimal important differences reported by the authors through sensitivity analysis due to the limited number of studies. Future studies should perform specific statistical analysis to identify which are the factors that could be associated with this variability in order to reduce the disparity and variability among studies. In addition, future studies should be aware of the recommendations regarding the reports that include minimal important differences and should report: 1) the scores from the baseline and follow-up, in order to enable future explorations, even considering the variability among studies [26]; 2) improve the reports regarding the correlations found between anchors and patient-reported outcome measures, during baseline and follow-up; 3) conduct studies that aim to validate anchors often used in studies of Women’s Health.

Twelve different patient-reported outcome measures with respective minimal important differences for outcomes related to urinary incontinence were found in the literature, considering 48 and 65 minimal important differences reported according to distribution- and anchor-based methods, respectively. Values based on distribution-based methods were smaller than the anchor-based method. However, the credibility and certainty of evidence of all the minimal important differences related to urinary incontinence measures reported by anchor-based methods were low and very low. The methodology to derive minimal important difference for outcomes related to urinary incontinence need to be improved.

Availability of data and materials

Not applicable.


  1. Haylen BT, De Ridder D, Freeman RM, Swift SE, Berghmans B, Lee J, et al. An international urogynecological association (IUGA)/international continence society (ICS) joint report on the terminology for female pelvic floor dysfunction. Neurourol Urodyn. 2010;29(1):4–20.

    Article  PubMed  Google Scholar 

  2. Bush TA, Castellucci DT, Phillips C. Exploring women’s beliefs regarding urinary incontinence. Urol Nurs. 2001;21(3):211–8.

    CAS  PubMed  Google Scholar 

  3. Santiagu SK, Arianayagam M, Wang A, Rashid P. Urinary incontinence-pathophysiology and management outline. Aust Fam Phys. 2008;37(3):106–10.

    Google Scholar 

  4. McKellar K, Abraham N. Prevalence, risk factors, and treatment for women with stress urinary incontinence in a racially and ethnically diverse population. Neurourol Urodyn. 2019;38(3):934–40.

    Article  PubMed  Google Scholar 

  5. Verbeek M, Hayward L. Pelvic floor dysfunction and its effect on quality of sexual life. Sex Med Rev. 2019;7(4):559–64.

    Article  PubMed  Google Scholar 

  6. Turner DA, Shaw C, McGrother CW, Dallosso HM, Cooper NJ. The cost of clinically significant urinary storage symptoms for community dwelling adults in the UK. BJU Int. 2004;93(9):1246–52.

    Article  CAS  PubMed  Google Scholar 

  7. Milsom I, Coyne KS, Nicholson S, Kvasz M, Chen CI, Wein AJ. Global prevalence and economic burden of urgency urinary incontinence: a systematic review. Eur Urol. 2014;65(1):79–95.

    Article  PubMed  Google Scholar 

  8. Pizzol D, Demurtas J, Celotto S, Maggi S, Smith L, Angiolelli G, et al. Urinary incontinence and quality of life: a systematic review and meta-analysis. Aging Clin Exp Res. 2021;33(1):25–35.

    Article  PubMed  Google Scholar 

  9. Seshan VM JK. Dimensions of the impact of urinary incontinence on quality of life of affected women: a review of the English literature. Int J Urol Nurs. 2014;8(2):62–70.

    Article  Google Scholar 

  10. John G, Bardini C, Combescure C, Dällenbach P. Urinary incontinence as a predictor of death: a systematic review and meta-analysis. PLoS One. 2016;11(7):e0158992.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Armijo-Olivo S, Warren S, Fuentes J, Magee DJ. Clinical relevance vs. statistical significance: using neck outcomes in patients with temporomandibular disorders as an example. Man Ther. 2011;16(6):563–72.

    Article  PubMed  Google Scholar 

  12. Millis SR. Emerging standards in statistical practice: implications for clinical trials in rehabilitation medicine. Am J Phys Med Rehabil. 2003;82(10 Suppl):S32–7.

    Article  ADS  PubMed  Google Scholar 

  13. Ogles BM, Lunnen KM, Bonesteel K. Clinical significance: history, application, and current practice. Clin Psychol Rev. 2001;21(3):421–46.

    Article  CAS  PubMed  Google Scholar 

  14. Armijo-Olivo S, Rappoport K, Fuentes J, Gadotti IC, Major PW, Warren S, et al. Head and cervical posture in patients with temporomandibular disorders. J Orofac Pain. 2011;25(3):199–209.

    PubMed  Google Scholar 

  15. Collins JP. Measures of clinical meaningfulness and important differences. Phys Ther. 2019;99(11):1574–9.

    Article  PubMed  Google Scholar 

  16. Guyatt GH, Osoba D, Wu AW, Wyrwich KW, Norman GR. Methods to explain the clinical significance of health status measures. Mayo Clin Proc. 2002;77(4):371–83.

    Article  PubMed  Google Scholar 

  17. Armijo-Olivo S, de Castro-Carletti E, Calixtre L, de Oliveira-Souza A, Mohamad N, Fuentes J. Understanding clinical significance in rehabilitation: a primer for researchers and clinicians. Am J Phys MedRehabil. 2022;101(1):64–77.

    Article  Google Scholar 

  18. Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol. 2003;56(5):395–407.

    Article  PubMed  Google Scholar 

  19. de Vet HCW, Terluin B, Knol DL, Roorda LD, Mokkink LB, Ostelo RWJG, et al. Three ways to quantify uncertainty in individually applied & #x201c;minimally important change” values. J Clin Epidemiol. 2010;63(1):37–45.

    Article  PubMed  Google Scholar 

  20. Lemieux J, Beaton DE, Hogg-Johnson S, Bordeleau LJ, Goodwin PJ. Three methods for minimally important difference: no relationship was found with the net proportion of patients improving. J Clin Epidemiol. 2007;60(5):448–55.

    Article  PubMed  Google Scholar 

  21. Lydick E, Epstein RS. Interpretation of quality of life changes. Qual Life Res. 1993;2(3):221–6.

    Article  CAS  PubMed  Google Scholar 

  22. Wright A, Hannon J, Hegedus EJ, Kavchak AE. Clinimetrics corner: a closer look at the minimal clinically important difference (MCID). J Man Manip Ther. 2012;20(3):160–6.

    Article  PubMed  PubMed Central  Google Scholar 

  23. McGlothlin AE, Lewis RJ. Minimal clinically important difference: defining what really matters to patients. JAMA. 2014;312(13):1342–3.

    Article  CAS  PubMed  Google Scholar 

  24. Devji T, Guyatt GH, Lytvyn L, Brignardello-Petersen R, Foroutan F, Sadeghirad B, et al. Application of minimal important differences in degenerative knee disease outcomes: a systematic review and case study to inform <em>BMJ</em> Rapid Recommendations. BMJ Open. 2017;7(5):e015587.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Hao Q, Devji T, Zeraatkar D, Wang Y, Qasim A, Siemieniuk RAC, et al. Minimal important differences for improvement in shoulder condition patient-reported outcomes: a systematic review to inform a BMJ Rapid Recommendation. BMJ Open. 2019;9(2):e028777.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Olsen MF, Bjerre E, Hansen MD, Tendal B, Hilden J, Hróbjartsson A. Minimum clinically important differences in chronic pain vary considerably by baseline pain and methodological factors: systematic review of empirical studies. J Clin Epidemiol. 2018;101:87–106.e2.

    Article  PubMed  Google Scholar 

  27. Ousmen A, Touraine C, Deliu N, Cottone F, Bonnetain F, Efficace F, et al. Distribution- and anchor-based methods to determine the minimally important difference on patient-reported outcome questionnaires in oncology: a structured review. Health Qual Life Outcomes. 2018;16(1):228.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Nilsagård Y, Lohse G. Evidence-based physiotherapy: a survey of knowledge, behaviour, attitudes and prerequisites. Adv Physiother. 2010;12(4):179–86.

    Article  Google Scholar 

  29. Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol. 2008;61(2):102–9.

    Article  PubMed  Google Scholar 

  30. Carrasco-Labra A, Devji T, Qasim A, Phillips MR, Wang Y, Johnston BC, et al. Minimal important difference estimates for patient-reported outcomes: a systematic survey. J Clin Epidemiol. 2021;133:61–71.

    Article  PubMed  Google Scholar 

  31. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Mokkink LB, Prinsen CA, Bouter LM, Vet HC, Terwee CB. The COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) and how to select an outcome measurement instrument. Braz J Phys Ther. 2016;20(2):105–13.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Deegan EG, Stothers L, Kavanagh A, Macnab AJ. Quantification of pelvic floor muscle strength in female urinary incontinence: a systematic review and comparison of contemporary methodologies. Neurourol Urodyn. 2018;37(1):33–45.

    Article  PubMed  Google Scholar 

  34. Devji T, Carrasco-Labra A, Qasim A, Phillips M, Johnston BC, Devasenapathy N, et al. Evaluating the credibility of anchor based estimates of minimal important differences for patient reported outcomes: instrument development and reliability study. BMJ. 2020;369:m1714.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Hutton B, Salanti G, Caldwell DM, Chaimani A, Schmid CH, Cameron C, et al. The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Ann Intern Med. 2015;162(11):777–84.

    Article  PubMed  Google Scholar 

  36. Trigg A., Griffiths P. Triangulation of multiple meaningful change thresholds for patient-reported outcome scores. Qual Life Res. 2021;30(10):2755–64.

  37. Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64(4):383–94.

    Article  PubMed  Google Scholar 

  38. Murad MH, Mustafa RA, Schünemann HJ, Sultan A, Santesso N. Rating the certainty in evidence in the absence of a single estimate of effect. Evid Based Med. 2017;22(3):85–7.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Patrick DL, Martin ML, Bushnell DM, Yalcin I, Wagner TH, Buesching DP. Quality of life of women with urinary incontinence: further development of the incontinence quality of life instrument (I-QOL). Urology. 1999;53(1):71–6.

    Article  CAS  PubMed  Google Scholar 

  40. Yalcin I, Patrick DL, Summers K, Kinchen K, Bump RC. Minimal clinically important differences in Incontinence Quality-of-Life scores in stress urinary incontinence. Urology. 2006;67(6):1304–8.

    Article  PubMed  Google Scholar 

  41. Barber MD, Spino C, Janz NK, Brubaker L, Nygaard I, Nager CW, Wheeler, TL, Pelvic Floor Disorders Network. The minimum important differences for the urinary scales of the Pelvic Floor Distress Inventory and Pelvic Floor Impact Questionnaire. Am J Obstet Gynecol. 2009;200(5):580.e1–7.

  42. Dyer K, Lukacz E, Brubaker L, Chai T, Markland A, Nygaard I, et al. Minimum important difference for validated instruments in women with urge incontinence. Neurourol Urodyn. 2010;29(2):301.

    Google Scholar 

  43. Chan SSC, Cheung RYK, Lai BPY, Lee LL, Choy KW, Chung TKH. Responsiveness of the pelvic floor distress inventory and pelvic floor impact questionnaire in women undergoing treatment for pelvic floor disorders. Int Urogynecol J Pelvic Floor Dysfunct. 2013;24(2):213–21.

    Article  Google Scholar 

  44. Suskind AM, Dunn RL, McGuire EJ, Wei JT, Morgan DM, Delancey JOL. The Michigan incontinence symptom index (M-ISI): a clinical measure for type, severity, and bother related to urinary incontinence. Neurourol Urodyn. 2014;33(7):1128–34.

    Article  PubMed  Google Scholar 

  45. Nystrom E, Sjostrom M, Stenlund H, Samuelsson E. ICIQ symptom and quality of life instruments measure clinically relevant improvements in women with stress urinary incontinence. Neurourol Urodyn. 2015;34(8):747–51.

    Article  CAS  PubMed  Google Scholar 

  46. Sirls LT, Tennstedt S, Brubaker L, Kim HY, Nygaard I, Rahn DD, et al. The minimum important difference for the International consultation on incontinence questionnaire - Urinary incontinence short form in women with stress urinary incontinence. Neurourol Urodyn. 2015;34(2):183–7.

    Article  PubMed  Google Scholar 

  47. Luz R, Pereira I, Henriques A, Ribeirinho AL, Valentim-Lourenco A. King’s Health Questionnaire to assess subjective outcomes after surgical treatment for urinary incontinence: can it be useful? Int Urogynecol J. 2017;28(1):139–45.

    Article  PubMed  Google Scholar 

  48. Lim R, Liong ML, Lim KK, Leong WS, Yuen KH. The Minimum Clinically Important Difference of the International Consultation on Incontinence Questionnaires (ICIQ-UI SF and ICIQ-LUTSqol). Urology. 2019;133:91–5.

    Article  PubMed  Google Scholar 

  49. Baessler K, Mowat A, Maher CF. The minimal important difference of the Australian Pelvic Floor Questionnaire. Int Urogynecol J. 2019;30(1):115–22.

    Article  PubMed  Google Scholar 

  50. Nipa SO, Cooper D, Mostada A, Hagen S, Abdel-Fattah M. Novel clinically meaningful scores for the ICIQ-UI-SF and ICIQ-FLUTS questionnaires in women with stress incontinence. Int Urogynecol J. 2023;34:3033–40.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Jayadevappa R, Cook R, Chhatre S. Minimal important difference to infer changes in health-related quality of life-a systematic review. J Clin Epidemiol. 2017;89:188–98.

    Article  PubMed  Google Scholar 

  52. Horváth K, Aschermann Z, Ács P, Deli G, Janszky J, Komoly S, et al. Minimal clinically important difference on the Motor Examination part of MDS-UPDRS. Parkinsonism Relat Disord. 2015;21(12):1421–6.

    Article  PubMed  Google Scholar 

  53. Terwee CB, Roorda LD, Dekker J, Bierma-Zeinstra SM, Peat G, Jordan KP, et al. Mind the MIC: large variation among populations and methods. J Clin Epidemiol. 2010;63(5):524–34.

    Article  PubMed  Google Scholar 

  54. Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10(4):407–15.

    Article  CAS  PubMed  Google Scholar 

  55. Halme AS, Fritel X, Benedetti A, Eng K, Tannenbaum C. Implications of the minimal clinically important difference for health-related quality-of-life outcomes: a comparison of sample size requirements for an incontinence treatment trial. Value Health. 2015;18(2):292–8.

    Article  PubMed  Google Scholar 

  56. Olsen MF, Bjerre E, Hansen MD, Hilden J, Landler NE, Tendal B, et al. Pain relief that matters to patients: systematic review of empirical studies assessing the minimum clinically important difference in acute pain. BMC Med. 2017;15(1):35.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Santanello NC, Zhang J, Seidenberg B, Reiss TF, Barber BL. What are minimal important changes for asthma measures in a clinical trial? Eur Respir J. 1999;14(1):23–7.

    Article  CAS  PubMed  Google Scholar 

Download references


Not applicable.


This work was supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES, Brazil), Financial Code 001.

Author information

Authors and Affiliations



JBS was responsible for the development and submission of the protocol, data synthesis, data analysis and manuscript preparation; LBC was responsible for data synthesis, data analysis and manuscript preparation; DVP was responsible for data synthesis and data analysis; PD was responsible for the development and submission of the protocol, data synthesis, data analysis and manuscript preparation; SAO was responsible for the development and submission of the protocol, data synthesis, data analysis, and manuscript preparation.

Authors' information

Not applicable.

Corresponding author

Correspondence to Jordana Barbosa-Silva.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barbosa-Silva, J., Calixtre, L.B., Von Piekartz, D. et al. The minimal important difference of patient-reported outcome measures related to female urinary incontinence: a systematic review. BMC Med Res Methodol 24, 60 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: