Systematic reviews provide clinical practice recommendations that are based on evaluation of primary evidence. When systematic reviews with the same aims have different conclusions, it is difficult to ascertain which review reported the most credible and robust findings.
This study examined five systematic reviews that have investigated the effectiveness of Pilates exercise in people with chronic low back pain. A four-stage process was used to interpret findings of the reviews. This process included comparison of research questions, included primary studies, and the level and quality of evidence of systematic reviews. Two independent reviewers assessed the level of evidence and the methodological quality of systematic reviews, using the National Health and Medical Research Council hierarchy of evidence, and the Revised Assessment of Multiple Systematic Reviews respectively. Any disagreements were resolved by a third researcher.
A high level of consensus was achieved between the reviewers. Conflicting findings were reported by the five systematic reviews regarding the effectiveness of Pilates in reducing pain and disability in people with chronic low back pain. Authors of the systematic reviews included primary studies that did not match their questions in relation to treatment or population characteristics. A total of ten primary studies were identified across five systematic reviews. Only two of the primary studies were included in all of the reviews due to different inclusion criteria relating to publication date and status, definition of Pilates, and methodological quality. The level of evidence of reviews was low due to the methodological design of the primary studies. The methodological quality of reviews varied. Those which conducted a meta-analysis obtained higher scores.
There is inconclusive evidence that Pilates is effective in reducing pain and disability in people with chronic low back pain. This is due to the small number and poor methodological quality of primary studies. The Revised Assessment of Multiple Systematic Reviews provides a useful method of appraising the methodological quality of systematic reviews. Individual item scores, however, should be examined in addition to total scores, so that significant methodological flaws of systematic reviews are not missed, and results are interpreted appropriately. (348 words)
PilatesExerciseLow back painSystematic review
Systematic reviews are ranked as the most valid form of research in several hierarchies of evidence [1, 2]. They provide evidence-based recommendations from the synthesis and critically appraisal of primary studies . Within health care, systematic reviews are used to efficiently obtain advice regarding client management . Conflicting results of systematic reviews, however, creates confusion for readers .
Several recently published systematic reviews have investigated the effectiveness of Pilates in people with chronic low back pain (CLBP) [6–10]. Pilates is a mind-body exercise that targets core stability, strength, flexibility, posture, breathing, and muscle control . It has been recommended in the management of people with CLBP, as this type of exercise may strengthen deep, stabilising muscles that support the lumbar spine, such as transverses abdominis [6, 12]. These muscles are inhibited in people with CLBP [13, 14].
Reviews examining the efficacy of Pilates in people with CLBP, however, report different conclusions. La Touche et al. (2008)  suggested that Pilates reduces pain and disability, while Lim et al. (2011)  reported that Pilates reduces pain when compared to minimal treatments, but not disability. In contrast, Pereira et al. (2012)  concluded that Pilates is ineffective in reducing pain and disability, and Posadzki et al. (2011)  suggested that evidence was inconclusive. Aladro-Gonzalvo et al. (2012) also provided conflicting results reporting that Pilates may reduce pain only when compared to minimal intervention, and disability only when compared to other physiotherapeutic treatments . These contradictory findings make it difficult to conclude on the efficacy of Pilates in people with CLBP and to direct use in clinical settings.
A systematic review of reviews was conducted to critically evaluate and summarise the results of all published systematic reviews that have investigated the effectiveness of Pilates exercise in reducing pain and disability in people with CLBP. Areas for improvement for systematic reviews were subsequently identified, and an evidence-based conclusion provided regarding the efficacy of Pilates exercise in people with CLBP.
A four-stage process was used to determine the appropriateness of systematic review conclusions. This involved comparison of reviews with respect to research questions, included primary studies, their level of evidence and methodological quality (Figure 1). The level of quality of the reviews was assessed using the National Health and Medical Research Council hierarchy of evidence (2009) , while the methodological quality was assessed using the Revised Assessment of Multiple Systematic Reviews (R-AMSTAR) . Systematic review findings were then interpreted with respect to these factors.
A systematic review design was chosen over a narrative review as it limits bias in the selection and appraisal of evidence [16–18]. In a systematic review, a comprehensive search of the literature is undertaken to answer a focused research question; the search strategy, criterion for selection and critical appraisal of literature is defined; quantitative rather than qualitative results are reported and evidence-based inferences are made . This systematic review was written to meet Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) guidelines .
A comprehensive literature search was undertaken using ten databases including Cumulative Index to Nursing and Allied Health Literature (CINAHL), Cochrane Library, Medline, Physiotherapy Evidence Database (PEDro), ProQuest:Health and Medical Complete, Proquest: Nursing and Allied Health Source, Proquest Research Library: Health and Medicine, Scopus, Sport Discus, and Web of Science. The standardised search strategy included the use of Medical Subject Headings (MeSH) terms “Pilates” and “Low Back Pain”, and search term “Review” in the title, abstract, and as able, the keyword fields within maximal date ranges of each database up until November 4, 2012 (Table 1).
Search strategy: using medical search headings (MeSH) “Pilates” and “Low Back Pain”, and search term “Review”
Cumulative Index to Nursing and Allied Health Literature (CINAHL)
Title, Abstract, or Word in Subject Heading
Title, Abstract or Keyword
Title, Abstract or Keyword
Physiotherapy Evidence Database (PEDro)
Title and Abstract
Medical and Health Complete
Title, Abstract, or Subject Heading
Nursing and Allied Health Source
Title, Abstract, or Keyword
Title, Abstract, or Keyword
Web of Science
Topic or Title
Preliminary searching revealed that expanding the search to include “exercise”, “motor control”, and “core stability” did not identify any additional reviews, nor did changing the Boolean operator to “or”. Removing “Low Back Pain” and “Review” also did not help identify any additional systematic reviews. Secondary searching of reference lists of included papers was undertaken to identify any additional, relevant studies that met the inclusion criteria.
Selection of relevant papers was based on the title, and if required, review of the abstract or full text of the document. Papers identified from the search process were assessed against inclusion and exclusion criteria by two independent reviewers (CW, BH). If there were any discrepancies in selected papers between the two reviewers, a third reviewer (AB) independently reviewed the papers and through discussion, obtained a consensus.
To be included in this systematic review, systematic reviews needed to:
· Be identified as a systematic review of 2 or more intervention studies. In a systematic review, a comprehensive search of the literature is undertaken to answer a focused research question; the search strategy, criterion for selection and critical appraisal of literature is defined; quantitative rather than qualitative results are reported and evidence-based inferences are made [16, 17]. Narrative reviews or expert commentaries did not meet inclusion requirements .
· Be published in the English language. For ease of interpretation and access, reviews that were unpublished or published in another language were excluded.
· Include human participants with chronic low back pain, that is, localised pain in the lumbar region that lasts for more than three months . If reviews only included participants with low back pain lasting less than three months, they were excluded.
· Assess the effectiveness of Pilates, where the term “Pilates” was used to describe the type of prescribed exercise being investigated. Exercises described as “motor control” or “lumbar stabilisation” did not suffice for Pilates. This is because Pilates may include features in addition to these exercise approaches .
· Use outcome measures to evaluate disability, that is, impairments, activity limitations or participation restrictions according to the International Classification of Health, Functioning, and Disability (ICF) . Pain is considered a functional impairment in the ICF.
Level of evidence
According to the NHMRC hierarchy, the level of evidence of a systematic review depends on the methodological design of included primary studies . Systematic reviews that include only randomised controlled trials are rated as the highest form of evidence. Systematic reviews that include studies other than randomised controlled trials are rated only as high as the lowest level of evidence represented by primary studies (Table 2). Two independent reviewers graded the level of evidence of systematic reviews according to the NHRMC hierarchy of evidence . Any discrepancies between the two reviewers were discussed with a third reviewer to obtain a consensus.
Modified national health and medical research council (NHMRC) hierarchy of evidence
Type of Intervention
Systematic Review of Randomised Controlled Trials
Randomised Controlled Trial
Pseudo-Randomised Controlled Trial, Comparative Study with or without Concurrent Controls
Case Series with either post-test or pre-test/post-test outcomes
The methodological quality of included systematic reviews was evaluated using the R-AMSTAR . The R-AMSTAR rates the methodological quality of systematic reviews by providing a numerical score for 11 items (Table 3). For each item, the methodological quality is scored out of 4 where one indicates poor methodological quality, and four indicates excellent methodological quality . R-AMSTAR items originate from the Assessment of Multiple Systematic Reviews (AMSTAR). While the AMSTAR has been shown to be valid and reliable in assessing the methodological quality of reviews, the numerical score provided by the R-AMSTAR provides an additional quantitative score that is easy to interpret [15, 21, 22].
R-AMSTAR scores for methodological quality of systematic reviews
Was there duplicate study selection and data extraction?
Was a comprehensive literature search performed?
Was the status of publication (i.e. grey literature) used as an inclusion criterion?
Was a list of studies (included and excluded) provided?
Were the characteristics of the included studies provided?
Was the scientific quality of the included studies assessed and documented?
Was the scientific quality of the included studies used appropriately in formulating conclusions?
Were the methods used to combine the findings of studies appropriate?
Was the likelihood of publication bias (a.k.a. “file drawer” effect) assessed?
Was the conflict of interest stated?
Score if satisfied 0 of the criteria [Items 1,2,4,6,10,11] or 0 or 1 of the criteria [Items 3,5, 7–9]
Score if satisfied 1 of the criteria [Items 1,2,4,6,10,11] or 2 of the criteria [Items 3,5, 7–9]
Score if satisfied 2 of the criteria [Items 1,2,4,6,10,11] or 3 of the criteria [Items 3,5, 7–9]
Score if satisfies 3 of the criteria [Items 1,2,4,6,10,11] or 4 of the criteria [Items 3,5, 7–9]
Adapted from Kung, Chiappelli, Cajulis, Avezova, Kossan, 2010 .
Two independent reviewers graded the reviews, with any discrepancies being resolved by discussion with a third reviewer. R-AMSTAR items were graded as per guidelines provided by Kung et al. (2010) . Percentile ranks were not calculated in this systematic review due to the small number of reviews being considered. Following grading of the methodological quality of the three systematic reviews, the percentage agreement and kappa score of agreement, and 95% confidence interval, between the two independent reviewers were calculated.
Data extraction and syntheses
The following data were extracted and synthesised from selected papers:
Author(s), year of publication, and reference of systematic reviews. Descriptive statistics were used to summarise the number of systematic reviews and dates of publication.
The findings and conclusions of systematic reviews in relation to pain and disability, including effect sizes and 95% confidence intervals provided by meta-analyses.
Author(s), year of publication, and reference of primary studies included in the systematic reviews. Descriptive statistics were used to summarise the number of primary studies, and differences in included primary studies across systematic reviews.
The NHMRC level of evidence and R-AMSTAR scores for methodological quality were calculated for each review and tabulated alongside author(s) and year of publication.
The research questions of systematic reviews in terms of study population, intervention, comparisons, and outcome measures. This included consideration of systematic review aims, and corresponding included primary study details.
A total of 44 papers were identified using the search strategy described in the methods. Five of these papers fulfilled selection criteria [6–10]. There was 100% agreement among the two independent reviewers on the selection of the systematic reviews. Most papers were excluded due to being duplicates, or not using a systematic review methodology (Figure 2).
Findings of systematic reviews
The five reviews had conflicting conclusions regarding the effectiveness of Pilates in reducing pain and disability in people with CLBP (Table 7). Three of the reviews conducted meta-analyses [7, 8, 10]. Aladro-Gonzalvo et al. (2012)  also conducted a meta-regression analysis to identify co-variants that may have contributed to the heterogeneity of treatment effect across studies . No predictor variable, however, was identified.
The authors of all reviews, apart from Posadzki et al. (2011) , failed to ensure the duration of symptoms reported by participants in primary studies matched with their research questions. For example, La Touche et al. (2011)  and Pereira et al. (2012)  aimed to focus on participants with CLBP, and Aladro-Gonzalvo et al. (2012)  and Lim et al. (2011)  on participants with low back pain lasting more than 6 weeks. The authors of these reviews, however, included primary studies with participants with acute, subacute, recurrent or chronic low back pain (Table 4).
Diverse Pilates exercise protocols for people with low back pain were reported across reviews (Table 4). In the majority of primary studies, authors prescribed Pilates mat exercises, although Anderson (2005)  and Rydeard et al. (2006)  suggested use of specialised Pilates equipment. Only 60% of primary studies described home exercises as part of the Pilates protocol [24–29].
The validity of Pilates exercise interventions in reviews also varied. La Touche et al. (2008) , Lim et al. (2011) , Pereira et al. (2012) , and Aladro-Gonzalvo et al. (2012) , ensured that treatments in primary studies were described solely as Pilates exercise. Posadzki et al. (2011) , however, included a primary study where treatment involved yoga, rehabilitation, and physical therapy as well .
Comparison treatments varied considerably, ranging from no exercise, usual care, massage, physiotherapy, and alternative exercises (Table 4). Usual care comparison treatments also differed, ranging from education and medication, to physiotherapy and bracing [25, 30, 31]. Co-interventions were also evident in two primary studies [29, 31].
There was also inconsistency across reviews regarding the description of comparison physiotherapy treatment within the Obrien et al. 2006  study. Pereira et al. (2012)  defined the type of physiotherapy as lumbar stabilisation exercise, while Lim et al. (2006)  reported that the physiotherapy treatment included other modalities as well.
d) Outcome measures
Similar outcome measures were used across primary studies and in the systematic reviews (Table 4). Lim et al. (2011) , Aladro-Gonzalvo et al. (2012) , and Pereira et al. (2012) , however, elected to use different outcome measures for pain given in the same primary study (Anderson, 2005) . That is, Lim et al. (2011)  and Aladro-Gonzalvo et al. (2012)  used the Miami Back Pain Index scores , while Pereira et al. (2012) used pain scores given within the Short Form Health Survey (SF-36) .
Although similar outcome measures were used across reviews, participants were evaluated at different points in time across primary studies. Timing of evaluation was dependent on the duration of the Pilates treatment and the length of follow up. The shortest follow up was at 6 weeks [28, 31, 32] and longest follow up assessment was at 12 months following the completion of Pilates treatment [24, 25, 30].
Description of population, intervention, comparison, outcomes measures in systematic reviews
Nonspecific low back pain greater than 6 weeks or recurrent (twice/year); specific low back pain with disc pathology greater than 6 weeks
15–60 minute sessions
Usual care, back school exercise+
Pain: NRS−11, NRS–101, RMVAS, VAS
Disability: ODQ, RMDQ
10 days–12 months
Abbreviations: MBI-pain - Miami Back Index pain subscale; NRS −11 - 11 point Numeric Rating Scale; NRS −101 - 101 point Numeric Rating Scale; ODI - Oswestry Disability Index; ODQ - Oswestry Low Back Pain Questionnaire; RMDQ - Roland Morris Disability Questionnaire; RMVAS -Roland Morris Visual Analog Scale; SF-36 Pain - Short Form Health Survey – Pain; VAS - Visual Analog Scale.
+ Back school exercise includes respiratory and postural education, muscle strengthening and mobilisation exercise [7, 23].
Included primary studies
There were ten different primary studies identified across the five systematic reviews [24–32, 35] (Table 5). La Touche et al. (2008)  and Posadzki et al. (2011)  included only studies published in full, as opposed to Aladro-Gonzalvo et al. (2012) , Lim et al. (2011) , and Pereira et al. (2012)  who included studies that were unpublished, or part-published [24, 28, 29, 32, 35]. Pereira et al. (2012)  also only included studies that had low risk of bias as defined by the Cochrane Back Review Group . This meant that Donzelli et al. (2006)  and Quinn (2005)  were not included in this review.
There was 100% agreement between reviewers regarding the methodological design, and level of evidence of the primary studies and the systematic reviews. Primary studies consisted of randomised controlled trials (n=4), pseudo-randomised controlled trials (n=5), and a parallel case series (n=1). According to the National Health and Medical Research Council (NHMRC) hierarchy, the level of evidence represented by these primary studies ranges from Level II to Level IV evidence  (Table 6).
Primary studies: level of evidence and methodological design
Aladro-Gonzalvo et al. (2012) , La Touche et al. (2008) , Lim et al. (2011) , and Posadzki et al. (2011)  included Donzelli et al. (2006) , a parallel case series article. These three reviews consequently represent Level IV evidence on the NHMRC hierarchy . Pereira et al. (2012)  excluded Donzelli et al. (2006) , but included two pseudo-randomised controlled trials [31, 32]. This means that the systematic review by Pereira et al. (2012)  represents Level III evidence on the NHMRC hierarchy .
The two reviewers agreed on 84% of R-AMSTAR scores across the systematic reviews (46/55). Different scores were obtained for criterion 9 and 10 for Aladro-Gonzalvo et al. (2012) , criterion 1, 2 and 6 for La Touche et al. (2008) , criterion 3 for Lim et al. (2011) , criterion 7 and 9 for Pereira et al. (2012) , criterion 8 for Posadzki et al. (2011) . The inter-rater agreement for R-AMSTAR scores remained “substantial” when chance agreement was eliminated (kappa: 0.78, 95% confidence interval: 0.71-0.85) . All disagreements were resolved through discussion with a third reviewer.
The R-AMSTAR scores of methodological quality of systematic reviews ranged from 19–37 out of 44 (Table 3). Aladro-Gonzalvo et al. (2012)  achieved the highest total score (37/44), followed by Lim et al. (2011)  (35/44), Pereira et al. (2012)  (32/44), Posadzki et al. (2011)  (30/44), and La Touche et al. (2008)  (19/44). The R-AMSTAR scores indicate that all reviews lacked a thorough assessment of publication bias and statement regarding conflict of interest. Duplicate data selection and extraction, inclusion of grey literature, listing of excluded studies, and documentation of study characteristics were also insufficient in several reviews [6–10].
Finally, R-AMSTAR scores identified that La Touche et al. (2008)  and Pereira et al. (2012)  needed to improve consideration of the methodological quality of the primary studies when formulating conclusions. Also, La Touche et al. (2008)  and Posadzki et al. (2011)  did not provide a justification for not undertaking a meta-analysis, and Lim et al. 2011  and Aladro-Gonzalvo et al. (2012)  needed to improve their method of combining findings of primary studies in their meta-analyses.
This systematic review identified five published reviews that have investigated the efficacy of Pilates exercise in the treatment of people with CLBP [6–10]. These reviews have different conclusions, despite having similar research aims. To interpret results of reviews, a comparison of research questions, included primary studies, the level of evidence, and the methodological quality of systematic reviews was undertaken (Figure 1). This process assisted in identifying and understanding the reasons for the different review findings, and in considering the validity of those findings .
La Touche et al. (2008)  and Posadzki et al. (2011)  included primary studies with participants with sub acute, chronic or recurrent low back pain. Meanwhile, Aladro-Gonzalvo et al. (2012) , Lim et al. (2011)  and Pereira et al. (2012)  incorporated an additional primary study that included participants with acute low back pain as well . Outcomes reported by and Aladro-Gonzalvo et al. (2012) , Lim et al. (2011)  and Pereira et al. (2012) therefore may be conservative and underestimate the effects of Pilates in people with CLBP, as people with acute low back pain tend to respond less favourably to exercise .
The findings of Aladro-Gonzalvo et al. (2012) , La Touche et al. (2008) , Lim et al. (2011) , and Pereira et al. (2012)  relate to people with non-specific low back pain. Non-specific low back pain is pain in the lower back without an identifiable pathology . In contrast, Posadski et al. (2011)  included an additional primary study with participants with low back pain related to disc pathology in the lumbar spine . Further research into the effectiveness of Pilates in people with low back pain with specific pathologies should be undertaken so that conclusions can be made regarding the efficacy Pilates in people with all forms of low back pain .
With regards to treatment, Aladro-Gonzalvo et al. (2012) , La Touche et al. (2008) , Lim et al. (2011) , and Pereira et al. (2012)  included primary studies that investigated only Pilates exercise. Posadzki et al. (2011) , however, included a primary study that evaluated the effectiveness of an intervention that was only part-Pilates . Treatment effects reported by this review may consequently relate to other therapies provided other than Pilates to the intervention group .
Pilates exercise protocols varied considerably across primary studies (Table 4). Authors of reviews reported Pilates exercise sessions of 15–60 minutes duration, 1–7 times per week, for 10 days and up to 12 months [6–10]. There was also variation in the use of mat versus specialised equipment, and incorporation of home exercises . Further research is therefore required to define the essential elements of Pilates exercise in people with chronic low back pain .
In terms of comparison treatments, usual care was defined differently across the primary studies [25, 30, 31]. This may have resulted in an inaccurate measurement of Pilates treatment effect as participants had variable types and amounts of “usual care” in both treatment and comparison groups . Pereira et al. (2012)  and Lim et al. (2011)  also described physiotherapy interventions provided by O’Brien et al. (2006)  differently. Pereira et al. (2012)  considered physiotherapy to consist of only lumbar stabilisation exercise, however, Lim et al. (2011)  reported physiotherapy treatment as also involving other modalities. This may have also contributed to inaccurate measurements of treatment effect with the pooling of primary studies with variable comparison treatments .
Similar outcome measures were used in primary studies to assess the effect of Pilates on pain and disability. The majority of these outcome measures are validated for use in people with low back pain, and have been found to be reliable [33, 34, 41]. The different treatment effects reported by Lim et al. (2011)  and Pereira et al. (2012) , however, could relate to the use of different outcome measures for pain intensity provided for Anderson (2005) .
Different findings between meta-analyses could also relate to different grouping of primary studies. For example, Aladro-Gonzalvo et al. (2012)  considered alternative exercise to Pilates to be a minimal intervention, while Lim et al. (2011)  and Pereira et al. (2012)  did not. Classifying alternative exercise to Pilates as a “minimal intervention” could be considered inappropriate as exercise has been found to reduce pain and disability in people with CLBP . Effect sizes for Pilates may therefore be more conservative in Aladro-Gonzalvo et al. (2012) .
Included primary studies
A comparison of included primary studies in reviews was undertaken as incorporating additional evidence can lead to different results . Nine of the primary studies were available at the time of publication of the first systematic review . La Touche et al. (2008)  and Posadzki et al. (2011) , however, chose to exclude unpublished primary studies and abstract articles (Table 7). This means that the findings of these reviews could be inflated as unpublished studies often have outcomes that are less positive or statistically insignificant .
Findings of systematic reviews: effectiveness of Pilates in people with chronic low back pain
Note : SMD - standardised mean difference; 95% CI - 95% confidence level.
+ Back school exercise includes respiratory and postural education, muscle strengthening and mobilisation exercise [7, 23].
In contrast, Aladro-Gonzalvo et al. (2012) , Lim et al. (2011)  and Pereira et al. (2012)  included several unpublished theses and an abstract study in their reviews (Table 5). These reviews, then, are likely to have less publication bias and more realistic findings . Pereira et al. (2012)  also excluded primary studies that had a high risk of bias as defined by the Cochrane Back Review Group . This review’s findings may therefore have greater credibility than other reviews .
The meta-regression analysis undertaken by Aladro-Gonzalvo et al. (2012)  did not identify any predictor variables that could explain differences in treatment effects across studies. This is not surprising, however, as the power of meta-regression was limited due to too few studies, and their heterogeneity [23, 45]. The rationale for examining several co-variants is also questionable, and aggregation bias likely as client-specific characteristics such as the duration of complaint were taken from the mean results of studies rather than individual statistics [23, 46, 47].
Level of evidence
The NHMRC level of evidence of all reviews was lower than expected for systematic reviews due to the inclusion of primary studies that were not randomised controlled trials. Aladro-Gonzalvo et al. (2012) , La Touche et al. (2011) , Lim et al. (2011) , and Posadzki et al. (2011)  represent the lowest level of evidence (Level IV) on the NHMRC hierarchy . This is because these reviews included Donzelli et al. (2006) , a parallel case series article. Pereira et al. (2012) , however, represents Level III evidence on the NHMRC hierarchy as this review included only pseudo-randomised and randomised controlled trials. This means the findings of all reviews may contain bias related to the methodological design of primary studies, but Pereira et al. (2012)  may be less biased than other reviews [1, 48].
The methodological quality of reviews was analysed to assist in the interpretation of findings . The R-AMSTAR provided a numerical score of methodological quality for each review based on AMSTAR criteria . The AMSTAR is reported as valid and reliable in assessing methodological quality of systematic reviews [5, 15, 21, 22]. The inter-rater agreement for R-AMSTAR scores remained “substantial” as indicated by a kappa score of 0.78, 95% confidence interval: 0.71-0.85 . This is similar to other scores reported for AMSTAR in the literature .
R-AMSTAR scores provide an indication of level of bias in review findings with high scores indicating greater credibility of findings . Findings of Aladro-Gonzalvo et al. (2012)  which scored 37/44, can therefore be considered to be the most robust in relation to the methodological quality of systematic reviews. Examining individual item scores with the R-AMSTAR, however, is also critical to identify factors that influence the credibility of findings.
La Touche et al. (2008)  and Pereira et al. (2012) , for example, did not consider the methodological quality of primary studies in forming their conclusions. This is despite significant methodological flaws being identified in primary studies, such as small sample sizes, baseline differences between treatment and control groups, high drop-out rates, lack of assessor blinding, and intention to treat analyses [6, 7, 9]. The conclusions of La Touche et al. (2008)  and Pereira et al. (2012) , therefore, need to be interpreted with caution as these factors were not considered .
There is also a concern that the high R-AMSTAR scores of Aladro-Gonzalvo et al. (2012) , Lim et al. (2011)  and Pereira et al. (2012)  do not reflect the inappropriateness of conducting a meta-analysis. Aladro-Gonzalvo et al. (2012) , Lim et al. (2011)  and Pereira et al. (2012)  pooled the results of primary studies that had similar comparison groups, but different treatment protocols, outcome measures, and timing of re-assessments (Table 2). This clinical heterogeneity should have indicated that conducting a meta-analysis was inappropriate . This is because pooling heterogeneous studies can produce inaccurate treatment effects [15, 50, 51].
Significant statistical heterogeneity (for example I2>60%) was also reported in both reviews when Pilates was compared to usual care [7, 8, 10]. This again suggests meta-analysis is inappropriate . Using a random effects model to compensate for heterogeneity may have assisted to improve the accuracy of findings, but it does not explain or remove the primary study differences . Moreover, combining two few primary studies in a meta-analysis can also produce misleading results . The findings of Aladro-Gonzalvo et al. (2012) , Lim et al. (2011)  and Pereira et al. (2012)  therefore need to be interpreted carefully due to the small number and heterogeneity of primary studies.
We are in agreement with Posadzki et al. (2011) , that there is inconclusive evidence that Pilates is effective in reducing pain and disability in people with CLBP. This conclusion relates to the insufficient number and methodological quality of available primary studies, rather than the methodological quality of reviews. These findings contrast to other review conclusions where Aladro-Gonzalvo et al. (2012) , La Touche et al. (2008)  and Lim et al. (2011)  report some effectiveness of Pilates, and Pereira et al. (2012) report no effectiveness.
Subsequent systematic reviews need to ensure that conclusions consider the methodological design and quality of primary studies. Meta-analyses and meta-regression analyses should also not be conducted when there is significant clinical and statistical heterogeneity across studies, and when primary studies are few in number. The Revised Assessment of Multiple Systematic Reviews provides a useful method of appraising the methodological quality of systematic reviews. Individual item scores, however, need to examined, in addition to total scores. This will ensure that significant methodological flaws are not missed, and results of reviews are interpreted appropriately.
CW is a registered physiotherapist undertaking doctoral studies at the University of Western Sydney under the supervision of AB, GSK, and PM. AB is an Associate Professor of Physiotherapy at Griffith University, however, the majority of this work was undertaken while AB was the Foundation Associate Professor and Head of Physiotherapy program at the University of Western Sydney. GSK is the Professor of Health Science and Dean of Science and Health, PM a Senior Lecturer, and CW a Lecturer in the School of Science and Health at the University of Western Sydney. BM is a registered physiotherapist and doctoral student at the University of Western Sydney.
Assessment of Multiple Systematic Reviews
Cumulative index to Nursing and Allied Health Literature
Chronic Low Back Pain
International Classification of Health Functioning, and Disability
National Health and Medical Research Council
Physiotherapy Evidence Database
Preferred Reporting Items for Systematic Review and Meta-Analyses
Revised Assessment of Multiple Systematic Reviews
36 Item Short Form Health Survey.
School of Science and Health, University of Western Sydney
Griffith Health Institute, Griffith University
National Health and Medical Research Council: NHMRC levels of evidence and grades for recommendations for developers of guidelines. Canberra: National Health and Medical Research Council; 2009.
Evans D: Hierarchy of evidence: A framework for the ranking of evidence evaluating nursing interventions.J Clin Nurs 2003, 12:77–84.PubMedView Article
Smith V, Devan D, Begley CM, Clarke M: Methodology in conducting a systematic review of systematic reviews of healthcare interventions.BMC Med Res Methodol 2011, 11:15.PubMedView Article
La Touche R, Escalante K, Linares MT: Treating non-specific chronic low back pain through the Pilates Method.J Bodyw Mov Ther 2008, 12:364–370.PubMedView Article
Lim ECW, Poh RLC, Low AY, Wong WP: Effects of Pilates-based exercises on pain and disability in individuals with persistent non specific low back pain: A systematic review with meta-analysis.J Orthop Sports Phys Ther 2011, 41:70–80.PubMed
Pereira LM, Obara K, Dias JM, Menacho MO, Guariglia DA, Schiavoni D, Pereira HM, Cardoso JR: Comparing the Pilates method with no exercise or lumbar stabilisation for pain and functionality in patients with chronic low back pain: Systematic review and meta-analysis.Clin Rehabil 2012, 26:10–20.PubMedView Article
Posadzki P, Lizis P, Hagner-Derengowska M: Pilates for low back pain: A systematic review.Complement Ther Clin Pract 2011, 17:85–89.PubMedView Article
Aladro-Gonzalvo AR, Araya-Vargas GA, Machado-Diaz M, Salazar-Rojas W: Pilates-based exercise for persistent, non specific low back pain and associated functional disability: A meta-analysis with meta-regression.J Bodyw Mov Ther 2012. http://dx.doi.org/10.1016/j.jbmt.2012.08.003
Wells C, Bialocerkowski A, Kolt GS: Definition of Pilates: A systematic review.Complement Ther Med 2012, 20:253–262.PubMedView Article
Endleman I, Critchley DJ: Transversus abdominis and obliquus internus activity during pilates exercises: measurement with ultrasound scanning.Arch Phys Med Rehabil 2008, 89:2205–2212.PubMedView Article
Ferreira PH, Ferreira ML, Maher CG, Refshauge K, Herbert R, Hodges PW: Changes in recruitment of transversus abdominis correlate with disability in people with chronic low back pain.Br J Sports Med 2010, 44:1166–72.PubMedView Article
Wallwork T, Stanton W, Freke M, Hides J: The effect of chronic low back pain on size and contraction of the lumbar multifidus muscle.Man Ther 2009, 14:496–500.PubMedView Article
Kung J, Chiappelli F, Cajulis OO, Avezova R, Kossan G: From systematic reviews to clinical recommendations for evidence-based health care: Validation of revised assessment of multiple systematic reviews (R-AMSTAR) for grading of clinical relevance.Open Dent J 2010, 4:84–91.PubMed
Collins J, Fauser B, Bart CJM: Balancing the strengths of systematic and narrative reviews.Human Reprod Update 2005, 11:103–104.View Article
Cook D, Mulrow C, Haynes R: Systematic reviews: Synthesis of best evidence for clinical decisions.Ann Intern Med 1997, 126:376–380.PubMed
Schlesselman JJ, Collins JA: Evaluating systematic reviews and meta-analyses.Semin Reprod Med 2003, 21:95–105.PubMedView Article
Charlton JE: Core Curriculum for Professional Education in Pain. 3rd edition. Seattle: International Association of the Study of Pain (IASP) Press; 2005.
Steiner WA, Ryser L, Huber EO, Uebelhart D, Aeschlimann A, Stucki G: Use of the ICF model as a clinical problem-solving tool in physical therapy and rehabilitation medicine.Phys Ther 2002, 82:1098–1107.PubMed
Shea B, Grimshaw J, Wells G, Boers M, Andersson N, Hamel C, Porter AC, Tugwell P, Moher D, Bouter LM: Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews.BMC Med Res Methodol 2007, 7:10.PubMedView Article
Shea BJ, Hamel C, Wells GA, Bouter LM, Kristjanson E, Grimshaw J, Henry DA, Boers M: AMSTAR is a reliable and valid measurement tool to assess the methodological quality of systematic reviews.J Clin Epidemiol 2009, 62:1013–1020.PubMedView Article
Baker WL, White CM, Cappelleri JC, Kluger J, Coleman C: Understanding heterogeneity in meta-analysis: The role of meta-regression.Int J Clin Practice 2009, 63:1426–1434.View Article
Anderson B: Randomised clinical trial comparing active versus passive approaches to the treatment of recurrent and chronic low back pain. University of Miami; 2005. PhD thesis
Rydeard R, Leger A, Smith D: Pilates-based therapeutic exercise: Effect on subjects with nonspecific chronic low back pain and functional disability: A randomized controlled trial.J Orthop Sports Phys Ther 2006, 36:472–484.PubMed
da Fonseca JL, Magini M, de Freitas TH: Laboratory Gait Analysis in Patients with Low Back Pain Before and After a Pilates Intervention.J Sport Rehabil 2009, 18:269–282.PubMed
Donzelli S, Di Domenica F, Cova AM, Galletti R, Giunta N: Two different techniques in the rehabilitation treatment of low back pain: a randomized controlled trial.Europa Medicophysica 2006, 42:205–210.PubMed
Gagnon LH: Efficacy of Pilates exercises as therapeutic intervention in treating patients with low back pain. University of Tennessee; 2005. PhD thesis
MacIntyre L: The effect of Pilates on patients’ chronic low back pain: A pilot study. Master of Science in Physiotherapy thesis. Johannesburg: University of the Witwatersrand; 2006. Thesis
Vad VB, Bhat AL, Tarabichi Y: The role of back rx exercise program in diskogenic low back pain: A prospective randomized trial.Arch Phys Med Rehabil 2007, 88:577–582.PubMedView Article
Gladwell V, Head S, Haggar M, Beneke R: Does a program of Pilates improve chronic non-specific low back pain?J Sport Rehabil 2006, 15:338–350.
O’Brien N, Hanlon N, Meldrum D: Randomised controlled trial comparing physiotherapy and Pilates in the treatment of ordinary low back pain[abstract].Phys Ther Rev 2006, 11:224–225.View Article
Roach K, Carreras K, Lee A, Reed L, Zimmerman G: Development and reliability of the Miami Back Index.JOSPT 2001, 31:97.
Chapman JR, Norvell DC, Hermsmeyer JT, Bransford RJ, DeVine J, McGirt MJ, Lee MJ: Evaluating common outcomes for measuring treatment success for chronic low back pain.Spine 2011,36(Suppl):S54–68.PubMedView Article
Quinn J: Influence of Pilates-based mat exercise on chronic lower back pain. Boca Raton, Florida: Florida Atlantic University; 2005. PhD thesis
Furlan AD, Pennick V, Bombardier C, van Tulder M: Updated method guidelines for systematic reviews in the Cochrane Back Review Group.Spine 2009, 34:1929–1941.PubMedView Article
Viera AJ, Garrett JM: Understanding interobserver agreement: The kappa statistic.Fam Med 2005, 37:360–363.PubMed
Maher C: Effective physical treatment for chronic low back pain.Orthop Clin North Am 2004, 35:57–64.PubMedView Article
Krismer M, van Tulder M: Strategies for prevention and management of musculoskeletal conditions. Low back pain (non-specific).Best Pract Res Clin Rheumatol 2007, 21:77–91.PubMedView Article
Godwin M, Ruhland L, Casson I, MacDonald S, Delva D, Birtwhistle R, Lam M, Seguin R: Pragmatic controlled clinical trials in primary care: The struggle between external and internal validity.BMC Med Res Methodol 2003, 3:28.PubMedView Article
Davidson M, Keating JL: A comparison of five low back disability questionnaires: Reliability and responsiveness.Phys Ther 2002, 82:8–24.PubMed
Hopewell S, McDonald S, Clarke M, Egger M: Grey literature in meta-analyses of randomised trials of health care interventions.Cochrane Database Syst Rev 2007. Issue 2. Art No.: MR000010
Dwan K, Altman DG, Arnaiz JA, Bloom J, Chan A-W, Cronin E, Decullier E, Easterbrook PJ, Von Elm E, Gamble C, Ghersi D, Ioannidis JPA, Simes J, Williamson PR: Systematic Review of the Empirical Evidence of Study Publication Bias and Outcome Reporting Bias.PLoS One 2008, 3:e3081.PubMedView Article
van Tulder MW, Suttorp M, Morton S, Bouter LM, Shekella P: Empirical evidence of an association between internal validity and effect size in randomized controlled trials of low-back pain.Spine 2009, 34:1685–1692.PubMedView Article
Sterne JAC, Gavaghan D, Egger M: Publication and related bias in meta-analysis: Power of statistical tests and prevalence in the literature.J Clin Epidemiol 2000, 53:1119–1129.PubMedView Article
Thomson SG, Higgins JP: How should meta-regression analyses be undertaken and interpreted?Stat Med 2002, 21:15591573.
Lambert PC, Sutton AJ, Abrams KR, Jones DR: A comparison of summary patient-level covariates in regression with individual patient data meta-analysis.J Clin Epidemiol 2002, 55:86–94.PubMedView Article
Egger M, Bartlett C, Holenstein F, Sterne J: How important are comprehensive literature searches and the assessment of trial quality in systematic reviews? Empirical study.Health Technol Assess 2003, 7:1–76.PubMed
Khad KS, Kunz R, Kleijnen J, Antes G: Five steps to conducting a systematic review.J R Soc Med 2003, 96:118–121.View Article
Slavin RE: Best evidence synthesis: An intelligent alternative to meta-analysis.Clin Epidemiol 1995, 48:9–18.View Article
Tobin MJ, Jabran A: Meta-analysis under the spotlight: Focused on a meta-analysis of ventilator weaning.Crit Care Med 2008, 36:1–7.PubMedView Article
Noordzij M, Hooft L, Dekker FW, Zoccali C, Jager KJ: Systematic reviews and meta-analyses: When they are useful and when to be careful.Kidney Int 2009, 76:1130–1136.PubMedView Article
Walker E, Hernandez AV, Kattan MW: Meta-analysis: Its strengths and limitations.Cleve Clin J Med 2008, 75:431–440.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.