Effectiveness of Pilates exercise in treating people with chronic low back pain: a systematic review of systematic reviews

Background Systematic reviews provide clinical practice recommendations that are based on evaluation of primary evidence. When systematic reviews with the same aims have different conclusions, it is difficult to ascertain which review reported the most credible and robust findings. Methods This study examined five systematic reviews that have investigated the effectiveness of Pilates exercise in people with chronic low back pain. A four-stage process was used to interpret findings of the reviews. This process included comparison of research questions, included primary studies, and the level and quality of evidence of systematic reviews. Two independent reviewers assessed the level of evidence and the methodological quality of systematic reviews, using the National Health and Medical Research Council hierarchy of evidence, and the Revised Assessment of Multiple Systematic Reviews respectively. Any disagreements were resolved by a third researcher. Results A high level of consensus was achieved between the reviewers. Conflicting findings were reported by the five systematic reviews regarding the effectiveness of Pilates in reducing pain and disability in people with chronic low back pain. Authors of the systematic reviews included primary studies that did not match their questions in relation to treatment or population characteristics. A total of ten primary studies were identified across five systematic reviews. Only two of the primary studies were included in all of the reviews due to different inclusion criteria relating to publication date and status, definition of Pilates, and methodological quality. The level of evidence of reviews was low due to the methodological design of the primary studies. The methodological quality of reviews varied. Those which conducted a meta-analysis obtained higher scores. Conclusion There is inconclusive evidence that Pilates is effective in reducing pain and disability in people with chronic low back pain. This is due to the small number and poor methodological quality of primary studies. The Revised Assessment of Multiple Systematic Reviews provides a useful method of appraising the methodological quality of systematic reviews. Individual item scores, however, should be examined in addition to total scores, so that significant methodological flaws of systematic reviews are not missed, and results are interpreted appropriately. (348 words)


Background
Systematic reviews are ranked as the most valid form of research in several hierarchies of evidence [1,2]. They provide evidence-based recommendations from the synthesis and critically appraisal of primary studies [3]. Within health care, systematic reviews are used to efficiently obtain advice regarding client management [4]. Conflicting results of systematic reviews, however, creates confusion for readers [5].
Several recently published systematic reviews have investigated the effectiveness of Pilates in people with chronic low back pain (CLBP) [6][7][8][9][10]. Pilates is a mindbody exercise that targets core stability, strength, flexibility, posture, breathing, and muscle control [11]. It has been recommended in the management of people with CLBP, as this type of exercise may strengthen deep, stabilising muscles that support the lumbar spine, such as transverses abdominis [6,12]. These muscles are inhibited in people with CLBP [13,14].
Reviews examining the efficacy of Pilates in people with CLBP, however, report different conclusions. La Touche et al. (2008) [6] suggested that Pilates reduces pain and disability, while Lim et al. (2011) [7] reported that Pilates reduces pain when compared to minimal treatments, but not disability. In contrast, Pereira et al. (2012) [8] concluded that Pilates is ineffective in reducing pain and disability, and Posadzki et al. (2011) [9] suggested that evidence was inconclusive. Aladro-Gonzalvo et al. (2012) also provided conflicting results reporting that Pilates may reduce pain only when compared to minimal intervention, and disability only when compared to other physiotherapeutic treatments [10]. These contradictory findings make it difficult to conclude on the efficacy of Pilates in people with CLBP and to direct use in clinical settings.
A systematic review of reviews was conducted to critically evaluate and summarise the results of all published systematic reviews that have investigated the effectiveness of Pilates exercise in reducing pain and disability in people with CLBP. Areas for improvement for systematic reviews were subsequently identified, and an evidence-based conclusion provided regarding the efficacy of Pilates exercise in people with CLBP.

Methods
A four-stage process was used to determine the appropriateness of systematic review conclusions. This involved comparison of reviews with respect to research questions, included primary studies, their level of evidence and methodological quality ( Figure 1). The level of quality of the reviews was assessed using the National Health and Medical Research Council hierarchy of evidence (2009) [1], while the methodological quality was assessed using the Revised Assessment of Multiple Systematic Reviews (R-AMSTAR) [15]. Systematic review findings were then interpreted with respect to these factors.

Study design
A systematic review design was chosen over a narrative review as it limits bias in the selection and appraisal of evidence [16][17][18]. In a systematic review, a comprehensive search of the literature is undertaken to answer a focused research question; the search strategy, criterion for selection and critical appraisal of literature is defined; quantitative rather than qualitative results are reported and evidence-based inferences are made [18]. This systematic review was written to meet Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) guidelines [3].

Search strategy
A comprehensive literature search was undertaken using ten databases including Cumulative Index to Nursing and Allied Health Literature (CINAHL), Cochrane Library, Medline, Physiotherapy Evidence Database (PEDro), Pro-Quest:Health and Medical Complete, Proquest: Nursing and Allied Health Source, Proquest Research Library: Health and Medicine, Scopus, Sport Discus, and Web of Science. The standardised search strategy included the use of Medical Subject Headings (MeSH) terms "Pilates" and "Low Back Pain", and search term "Review" in the title, abstract, and as able, the keyword fields within maximal date ranges of each database up until November 4, 2012 (Table 1).
Preliminary searching revealed that expanding the search to include "exercise", "motor control", and "core stability" did not identify any additional reviews, nor did changing the Boolean operator to "or". Removing "Low Back Pain" and "Review" also did not help identify any additional systematic reviews. Secondary searching of reference lists of included papers was undertaken to identify any additional, relevant studies that met the inclusion criteria.

Selection procedures
Selection of relevant papers was based on the title, and if required, review of the abstract or full text of the document. Papers identified from the search process were assessed against inclusion and exclusion criteria by two independent reviewers (CW, BH). If there were any discrepancies in selected papers between the two reviewers, a third reviewer (AB) independently reviewed the papers and through discussion, obtained a consensus.

Selection criteria
To be included in this systematic review, systematic reviews needed to: • Be identified as a systematic review of 2 or more intervention studies. In a systematic review, a comprehensive search of the literature is undertaken to answer a focused research question; the search strategy, criterion for selection and critical appraisal of literature is defined; quantitative rather than qualitative results are reported and evidence-based inferences are made [16,17]. Narrative reviews or expert commentaries did not meet inclusion requirements [17].
• Be published in the English language. For ease of interpretation and access, reviews that were unpublished or published in another language were excluded.
• Include human participants with chronic low back pain, that is, localised pain in the lumbar region that lasts for more than three months [19]. If reviews only included participants with low back pain lasting less than three months, they were excluded.
• Assess the effectiveness of Pilates, where the term "Pilates" was used to describe the type of prescribed exercise being investigated. Exercises described as "motor control" or "lumbar stabilisation" did not suffice for Pilates. This is because Pilates may include features in addition to these exercise approaches [11].
• Use outcome measures to evaluate disability, that is, impairments, activity limitations or participation restrictions according to the International Classification  [20]. Pain is considered a functional impairment in the ICF.

Level of evidence
According to the NHMRC hierarchy, the level of evidence of a systematic review depends on the methodological design of included primary studies [1]. Systematic reviews that include only randomised controlled trials are rated as the highest form of evidence. Systematic reviews that include studies other than randomised controlled trials are rated only as high as the lowest level of evidence represented by primary studies (Table 2). Two independent reviewers graded the level of evidence of systematic reviews according to the NHRMC hierarchy of evidence [1]. Any discrepancies between the two reviewers were discussed with a third reviewer to obtain a consensus.

Methodological quality
The methodological quality of included systematic reviews was evaluated using the R-AMSTAR [15]. The R-AMSTAR rates the methodological quality of systematic reviews by providing a numerical score for 11 items (Table 3). For each item, the methodological quality is scored out of 4 where one indicates poor methodological quality, and four indicates excellent methodological quality [15]. R-AMSTAR items originate from the Assessment of Multiple Systematic Reviews (AMSTAR). While the AMSTAR has been shown to be valid and reliable in assessing the methodological quality of reviews, the numerical score provided by the R-AMSTAR provides an additional quantitative score that is easy to interpret [15,21,22]. Two independent reviewers graded the reviews, with any discrepancies being resolved by discussion with a third reviewer. R-AMSTAR items were graded as per guidelines provided by Kung et al. (2010) [15]. Percentile ranks were not calculated in this systematic review due to the small number of reviews being considered. Following grading of the methodological quality of the three systematic reviews, the percentage agreement and kappa score of agreement, and 95% confidence interval, between the two independent reviewers were calculated.

Data extraction and syntheses
The following data were extracted and synthesised from selected papers: 1. Author(s), year of publication, and reference of systematic reviews. Descriptive statistics were used to summarise the number of systematic reviews and dates of publication. 2. The findings and conclusions of systematic reviews in relation to pain and disability, including effect sizes and 95% confidence intervals provided by metaanalyses. 3. Author(s), year of publication, and reference of primary studies included in the systematic reviews. Descriptive statistics were used to summarise the number of primary studies, and differences in included primary studies across systematic reviews. 4. The NHMRC level of evidence and R-AMSTAR scores for methodological quality were calculated for each review and tabulated alongside author(s) and year of publication. 5. The research questions of systematic reviews in terms of study population, intervention, comparisons, and outcome measures. This included consideration of systematic review aims, and corresponding included primary study details.

Results
A total of 44 papers were identified using the search strategy described in the methods. Five of these papers fulfilled selection criteria [6][7][8][9][10]. There was 100% agreement among the two independent reviewers on the selection of the systematic reviews. Most papers were excluded due to being duplicates, or not using a systematic review methodology ( Figure 2).

Findings of systematic reviews
The five reviews had conflicting conclusions regarding the effectiveness of Pilates in reducing pain and disability in people with CLBP ( Table 7). Three of the reviews conducted meta-analyses [7,8,10]. Aladro-Gonzalvo et al.
(2012) [10] also conducted a meta-regression analysis to identify co-variants that may have contributed to the heterogeneity of treatment effect across studies [23]. No predictor variable, however, was identified.   [7] on participants with low back pain lasting more than 6 weeks. The authors of these reviews, however, included primary studies with participants with acute, subacute, recurrent or chronic low back pain (Table 4). b) Intervention Diverse Pilates exercise protocols for people with low back pain were reported across reviews (  [9], however, included a primary study where treatment involved yoga, rehabilitation, and physical therapy as well [30]. c) Comparison Comparison treatments varied considerably, ranging from no exercise, usual care, massage, physiotherapy, and alternative exercises (Table 4). Usual care comparison treatments also differed, ranging from education and medication, to physiotherapy and bracing [25,30,31]. Co-interventions were also evident in two primary studies [29,31]. There was also inconsistency across reviews regarding the description of comparison physiotherapy treatment within the Obrien et al.

5.
Was a list of studies (included and excluded) provided?

6.
Were the characteristics of the included studies provided?

7.
Was the scientific quality of the included studies assessed and documented?

8.
Was the scientific quality of the included studies used appropriately in formulating conclusions?

9.
Were the methods used to combine the findings of studies appropriate?

11.
Was the conflict of interest stated?  [35] were not included in this review.

Level of evidence
There was 100% agreement between reviewers regarding the methodological design, and level of evidence of the primary studies and the systematic reviews. Primary studies consisted of randomised controlled trials (n=4), pseudo-randomised controlled trials (n=5), and a parallel case series (n=1). According to the National Health and Medical Research Council (NHMRC) hierarchy, the level of evidence represented by these primary studies ranges from Level II to Level IV evidence [1] (Table 6)

Systematic review Published in the English language
Papers excluded (n=9):

Efficacy of Pilates exercise in the management of people with chronic low back pain
Full-text papers excluded (n=3):

Do not investigate Pilates exercise in people with chronic low back pain (n=3) Eligibility
Records identified through database searching (44) [27], but included two pseudo-randomised controlled trials [31,32]. This means that the systematic review by Pereira et al.

Methodological quality
The two reviewers agreed on 84% of R-AMSTAR scores across the systematic reviews (46/55). Different scores were obtained for criterion 9 and 10 for Aladro-Gonzalvo et al.  [9]. The inter-rater agreement for R-AMSTAR scores remained "substantial" when chance agreement was eliminated (kappa: 0.78, 95% confidence interval: 0.71-0.85) [37]. All disagreements were resolved through discussion with a third reviewer. The R-AMSTAR scores of methodological quality of systematic reviews ranged from 19-37 out of 44 (Table 3)

Discussion
This systematic review identified five published reviews that have investigated the efficacy of Pilates exercise in the treatment of people with CLBP [6][7][8][9][10]. These reviews have different conclusions, despite having similar research aims. To interpret results of reviews, a comparison of research  questions, included primary studies, the level of evidence, and the methodological quality of systematic reviews was undertaken ( Figure 1). This process assisted in identifying and understanding the reasons for the different review findings, and in considering the validity of those findings [36].  [8] relate to people with nonspecific low back pain. Non-specific low back pain is pain in the lower back without an identifiable pathology [39]. In contrast, Posadski et al. (2011) [9] included an additional primary study with participants with low back pain related to disc pathology in the lumbar spine [30]. Further research into the effectiveness of Pilates in people with low back pain with specific pathologies should be undertaken so that conclusions can be made regarding the efficacy Pilates in people with all forms of low back pain [36].

Research questions
With regards to treatment, Aladro-Gonzalvo et al.  [9], however, included a primary study that evaluated the effectiveness of an intervention that was only part-Pilates [30]. Treatment effects reported by this review may consequently relate to other therapies provided other than Pilates to the intervention group [40].
Pilates exercise protocols varied considerably across primary studies ( Table 4). Authors of reviews reported Pilates exercise sessions of 15-60 minutes duration, 1-7 times per week, for 10 days and up to 12 months [6][7][8][9][10]. There was also variation in the use of mat versus specialised equipment, and incorporation of home exercises [7]. Further research is therefore required to define the  essential elements of Pilates exercise in people with chronic low back pain [10]. In terms of comparison treatments, usual care was defined differently across the primary studies [25,30,31]. This may have resulted in an inaccurate measurement of Pilates treatment effect as participants had variable types and amounts of "usual care" in both treatment and comparison groups [40]. Pereira [7] reported physiotherapy treatment as also involving other modalities. This may have also contributed to inaccurate measurements of treatment effect with the pooling of primary studies with variable comparison treatments [40].
Similar outcome measures were used in primary studies to assess the effect of Pilates on pain and disability. The majority of these outcome measures are validated for use in people with low back pain, and have been found to be reliable [33,34,41]. The different treatment effects reported by Lim et al. (2011) [7] and Pereira et al. (2012) [8], however, could relate to the use of different outcome measures for pain intensity provided for Anderson (2005) [24].
Different findings between meta-analyses could also relate to different grouping of primary studies.  [8] did not. Classifying alternative exercise to Pilates as a "minimal intervention" could be considered inappropriate as exercise has been found to reduce pain and disability in people with CLBP [38]. Effect sizes for Pilates may therefore be more conservative in Aladro-Gonzalvo et al. (2012) [40].

Included primary studies
A comparison of included primary studies in reviews was undertaken as incorporating additional evidence can lead to different results [42]. Nine of the primary studies were available at the time of publication of the first systematic review [6]. La Touche et al. (2008) [6] and Posadzki et al. (2011) [9], however, chose to exclude unpublished primary studies and abstract articles (Table 7). This means that the findings of these reviews could be inflated as unpublished studies often have outcomes that are less positive or statistically insignificant [43].
In  [8] included several unpublished theses and an abstract study in their reviews (Table 5). These reviews, then, are likely to have less publication bias and more realistic findings [43]. Pereira et al. (2012) [8] also excluded primary studies that had a high risk of bias as defined by the Cochrane Back Review Group [36]. This review's findings may therefore have greater credibility than other reviews [44].
The meta-regression analysis undertaken by Aladro-Gonzalvo et al. (2012) [10] did not identify any predictor variables that could explain differences in treatment effects across studies. This is not surprising, however, as the power of meta-regression was limited due to too few studies, and their heterogeneity [23,45]. The rationale for examining several co-variants is also questionable, and aggregation bias likely as client-specific characteristics such as the duration of complaint were taken from the mean results of studies rather than individual statistics [23,46,47].

Level of evidence
The NHMRC level of evidence of all reviews was lower than expected for systematic reviews due to the inclusion of primary studies that were not randomised controlled trials. Aladro-Gonzalvo et al.  [9] represent the lowest level of evidence (Level IV) on the NHMRC hierarchy [1]. This is because these reviews included Donzelli et al. (2006) [27], a parallel case series article. Pereira et al. (2012) [8], however, represents Level III evidence on the NHMRC hierarchy as this review included only pseudo-randomised and randomised controlled trials. This means the findings of all reviews may contain bias related to the methodological design of primary studies, but Pereira et al. (2012) [8] may be less biased than other reviews [1,48].

Methodological quality
The methodological quality of reviews was analysed to assist in the interpretation of findings [5]. The R-AMSTAR provided a numerical score of methodological quality for each review based on AMSTAR criteria [11]. The AMSTAR is reported as valid and reliable in assessing methodological quality of systematic reviews [5,15,21,22]. The inter-rater agreement for R-AMSTAR scores remained "substantial" as indicated by a kappa score of 0.78, 95% confidence interval: 0.71-0.85 [37]. This is similar to other scores reported for AMSTAR in the literature [22]. R-AMSTAR scores provide an indication of level of bias in review findings with high scores indicating greater credibility of findings [15]. Findings of Aladro-Gonzalvo et al. (2012) [10] which scored 37/44, can therefore be considered to be the most robust in relation to the methodological quality of systematic reviews. Examining individual item scores with the R-AMSTAR, however, is also critical to identify factors that influence the credibility of findings. This is despite significant methodological flaws being identified in primary studies, such as small sample sizes, baseline differences between treatment and control groups, high drop-out rates, lack of assessor blinding, and intention to treat analyses [6,7,9].  [8], therefore, need to be interpreted with caution as these factors were not considered [49].
There is also a concern that the high R-AMSTAR scores of Aladro-Gonzalvo et al.  [8] pooled the results of primary studies that had similar comparison groups, but different treatment protocols, outcome measures, and timing of re-assessments (Table 2). This clinical heterogeneity should have indicated that conducting a meta-analysis was inappropriate [36]. This is because pooling heterogeneous studies can produce inaccurate treatment effects [15,50,51].
Significant statistical heterogeneity (for example I 2 >60%) was also reported in both reviews when Pilates was compared to usual care [7,8,10]. This again suggests metaanalysis is inappropriate [52]. Using a random effects model to compensate for heterogeneity may have assisted to improve the accuracy of findings, but it does not explain or remove the primary study differences [36]. Moreover, combining two few primary studies in a metaanalysis can also produce misleading results [53].

Conclusion
We are in agreement with Posadzki et al. (2011) [9], that there is inconclusive evidence that Pilates is effective in reducing pain and disability in people with CLBP. This conclusion relates to the insufficient number and methodological quality of available primary studies, rather than the methodological quality of reviews. Subsequent systematic reviews need to ensure that conclusions consider the methodological design and quality of primary studies. Meta-analyses and meta-regression analyses should also not be conducted when there is significant clinical and statistical heterogeneity across studies, and when primary studies are few in number. The Revised Assessment of Multiple Systematic Reviews provides a useful method of appraising the methodological quality of systematic reviews. Individual item scores, however, need to examined, in addition to total scores. This will ensure that significant methodological flaws are not missed, and results of reviews are interpreted appropriately.

Competing interests
There are no financial or non-financial competing interests for any of the authors of this review.  * Usual care, back school exercise * Unknown, evidence is inconclusive * Unknown, evidence is inconclusive Note : SMD -standardised mean difference; 95% CI -95% confidence level. + Back school exercise includes respiratory and postural education, muscle strengthening and mobilisation exercise [7,23].