Skip to main content
  • Research article
  • Open access
  • Published:

Methods for network meta-analysis of continuous outcomes using individual patient data: a case study in acupuncture for chronic pain



Network meta-analysis methods, which are an extension of the standard pair-wise synthesis framework, allow for the simultaneous comparison of multiple interventions and consideration of the entire body of evidence in a single statistical model. There are well-established advantages to using individual patient data to perform network meta-analysis and methods for network meta-analysis of individual patient data have already been developed for dichotomous and time-to-event data. This paper describes appropriate methods for the network meta-analysis of individual patient data on continuous outcomes.


This paper introduces and describes network meta-analysis of individual patient data models for continuous outcomes using the analysis of covariance framework. Comparisons are made between this approach and change score and final score only approaches, which are frequently used and have been proposed in the methodological literature. A motivating example on the effectiveness of acupuncture for chronic pain is used to demonstrate the methods. Individual patient data on 28 randomised controlled trials were synthesised. Consistency of endpoints across the evidence base was obtained through standardisation and mapping exercises.


Individual patient data availability avoided the use of non-baseline-adjusted models, allowing instead for analysis of covariance models to be applied and thus improving the precision of treatment effect estimates while adjusting for baseline imbalance.


The network meta-analysis of individual patient data using the analysis of covariance approach is advocated to be the most appropriate modelling approach for network meta-analysis of continuous outcomes, particularly in the presence of baseline imbalance. Further methods developments are required to address the challenge of analysing aggregate level data in the presence of baseline imbalance.

Peer Review reports


Evidence synthesis tools are increasingly used to pool estimates of treatment effects from multiple randomised controlled trials (RCTs) to inform assessments of comparative effectiveness generally, and particularly in the context of health technology assessment. One such tool is network meta-analysis (NMA, also known as mixed treatment comparisons), which extends standard pair-wise meta-analysis by allowing the simultaneous synthesis of evidence on multiple treatments [14]. Most published work focuses on the pooling of aggregate outcome data (AD), but with the increasing availability of individual patient data (IPD) synthesis methods have recently emerged to utilise IPD [510]. The use of IPD allows the consistent use of statistical methods across the body of evidence. It also creates added value by offering the potential to reduce and/or explain network heterogeneity, tackle existing evidence inconsistencies [11], and to examine subgroup effects in patients where interventions might have an effectiveness profile which differs from that of the wider population [10, 12]. Despite its advantages, only a few methodological studies on the synthesis of IPD in NMA are available in the published literature and even fewer examples of its use within cost-effectiveness (CE) analysis exist [13]. Methods for NMA of IPD have focused mainly on a subset of the available types of outcomes, i.e. binary and time-to-event outcomes [10, 14]. Few publications exist dedicated to continuous outcomes [15, 16], an important outcome set in medical applications, as well as in complementary medicine and beyond. Recent publications by Hong et al. [15] and Thom et al. [16] explored and discussed the synthesis of continuous endpoints using IPD in NMA. While the former proposes a framework to pool multiple continuous outcomes under contrast- and arm-based parameterisations, the latter focused mainly on modelling observational evidence available in both IPD and AD formats. Both papers chose the change from baseline as their continuous outcome for synthesis but did not adjust for baseline values of the outcome, apart from when modelling baseline outcome as a treatment-effect modifier [15]. In this paper we present a model for NMA of IPD on continuous outcomes using the analysis of covariance (ANCOVA) approach which does adjust for baseline outcome data.

Analysis of covariance (ANCOVA), where the outcome at follow-up is modelled whilst adjusting for its baseline value, is the preferred method for estimating treatment effects from continuous outcomes [1719]. Treatment effect estimates based on ANCOVA methods are the most precise estimates and are robust to chance baseline imbalance. As such, these should be the desired outcome measure for synthesis [2022]. Unfortunately, ANCOVA results are frequently not reported for individual studies and, therefore, ANCOVA is often not used in the synthesis of aggregate evidence. Instead, sub-optimal methods are used [2325] such as unadjusted differences in change from baseline or final outcome measures.

When IPD is available from each study, the full set of statistical approaches are available to analysts. Riley et al. [22] discuss different approaches to the synthesis of continuous outcome data when IPD is available in a pair-wise meta-analysis framework. The authors highlight that availability of IPD is crucial to implement the most appropriate modelling approach, the ANCOVA [19, 22]. To our knowledge, such an ANCOVA synthesis model has not yet been extended to and/or explored in the NMA setting.

In this paper we present a Bayesian NMA model for the synthesis of continuous IPD using the ANCOVA framework. The paper aims to ensure that best practice in the analysis of continuous outcome data within individual trials and pairwise meta-analyses is extended to the NMA context. We also aim to illustrate the differences (and similarities) between NMA of IPD when using ANCOVA, change score and final score only approaches. The method presented is applied to a case study of acupuncture for chronic pain. The paper is structured as follows. Section 2 presents the motivating example for the manuscript, describes the evidence available and outlines the analysis undertaken to obtain outcome data for synthesis. Section 3 describes the NMA ANCOVA model for IPD on continuous outcomes, followed by extensions that incorporate treatment effect–covariate interactions. Results of applying the methods described to the motivating dataset are reported in Section 4, which is followed by some discussion topics and concluding remarks in Section 5.

Evidence on the effectiveness of acupuncture for chronic pain in primary care

There is currently a lack of agreement about the effectiveness of acupuncture as a treatment for chronic pain, as reflected in debates about recent UK guidance surrounding its value [2632]. Acupuncture received a positive recommendation from the National Institute for Health and Care Excellence (NICE) for its use in back pain [26] and headache/migraine [27], while a negative recommendation was given for its use in osteoarthritis in 2008 and 2014 [28]. The methods in this paper were developed as part of a project to improve evidence regarding the effectiveness and CE of acupuncture for chronic non-specific pain to inform decision making in the UK National Health Service [33].


The data used in this study was provided by the Acupuncture Trialists’ Collaboration (ATC) who performed a systematic review in which relevant high quality trials were identified and, for a large proportion of trials, IPD was obtained (29 out of 31 studies) [34, 35]. The dataset available to us comprised 28 out of these 29 RCTs which assessed the effectiveness of acupuncture in three pain conditions: osteoarthritis of the knee (OAK) (7 trials [3642]), headache, including tension-type headache (TTH) and migraine (6 trials [4348]) and musculoskeletal conditions, encompassing lower back, shoulder and neck pain (15 trials [4963]). This dataset comprises 17,512 patients. These studies are summarised in Table 1.

Table 1 Main characteristics of the data and study outcomes used for analysis

The dataset includes 11 trials comparing acupuncture to sham acupuncture, 8 comparing acupuncture and usual care, and 9 comparing all three comparators. The resulting evidence network is presented in Fig. 1.

Fig. 1
figure 1

Network of RCTs. Legend: In the network, a unique treatment category is indicated by a circle. Arrows between circles indicate that these treatments have been compared in a trial (trials are identified using ‘[]’, numbered according to column ‘ID’ in Table 1. (Pain groups: H – Headache/migraine; MSK – Musculoskeletal; OAK – Osteoarthritis of the knee)


One key aspect of the evidence available in this setting is the heterogeneous reporting of relevant outcomes across trials. The ATC dataset varied according to the type of outcomes reported but also on how these were collected across time. To address this issue, two outcome measures are used within this paper. The first is a standardised pain-related outcome, a dimensionless measure of treatment effect usually termed standardised mean difference (SMD) [20, 21, 64, 65]. For this analysis the primary outcome of each study was used to generate patient-level standardised pain estimates. Pain measures varied from days with headache in the headache/migraine pain condition, to visual analogue scale (VAS) pain in the musculoskeletal group or to Western Ontario and McMaster Universities Arthritis Index (WOMAC) pain in the OAK group, as reported in Table 1 (column on the right hand side). Individual-level standardised pain estimates were obtained for each trial by dividing the primary outcome scores by the study-specific standard deviation. Note that while these estimates were used as inputs in the synthesis models, the outputs of the synthesis are in the SMD format, as differences between treatments were estimated within the modellingFootnote 1.

While SMDs may be useful for detecting differences between interventions, they are of limited value to decision making as these cannot directly inform estimates of absolute effect or CE modelling, unless they are first transformed [20]. These considerations motivated the second synthesis approach used, which involved translating (or ‘mapping’) the available patient-reported outcome data from the trials into EuroQol five-dimension (EQ-5D) index values [66]. The EQ-5D is a popular preference-based generic health-related quality of life (HRQoL) measure, typically employed to weight life years gained and thus derive quality-adjusted life-years (QALYs) for use in CE analysis [67]. The EQ-5D preference score was the second outcome explored. Due to its importance in supporting health system decision making processes, the EQ-5D preference score, used in CE analysis, has applications in many jurisdictions worldwide, including the UK [68]. The conventional EQ-5D questionnaire includes five domains, each of which can be at one of three severity levels. Using an algorithm, responses to this questionnaire can be transformed to a numeric value that reflects the preferences of the public for different heath states (here we used values from the UK general public [69]). Values range from −0.594 to 1 (the bounds represent, respectively, the worst imaginable health state and full health, with zero relating to death).

Only a small number of trials (n = 2) in the dataset directly provided EQ-5D data [36, 56]. Where such data were not available it was predicted using other generic and disease specific measuresFootnote 2 (Table 1) through published mapping algorithms. In 50 % (n = 14) of the trials, well established published algorithms were used to map from Short Form (SF)-36 dimensions and SF-12 summary scores to EQ-5DFootnote 3 [58, 70, 71]. In 10 of the 28 trials, published algorithms which map VAS pain scores [72] and WOMAC scores [73] to EQ-5D were usedFootnote 4. For one trial, a double mapping approach was necessary as, to our knowledge, no direct mapping algorithm exists to obtain EQ-5D values from Constant Murley Score (CMS). Thus, an unpublished mapping algorithm (a report describing the derivation of the mapping algorithm is available on request from Kamran Khan: [74]) was used to derive VAS pain estimates from CMS, which were then used to obtain individual-level EQ-5D predictions using the Maund et al. [72] algorithm.

For the majority of mapping models used, the proportion of total variation explained (quantified by the coefficient of determination, R2, in most cases) was low. To account for this additional source of uncertainty, an additional variance component was includedFootnote 5 [75]. This was achieved by adding to each individual-level EQ-5D prediction a draw from a normal distribution with mean zero and variance equal to the study-specific residual variance, that is, \( Var\left[\widehat{EQ5D}\right]\cdot \left(1-{R}^2\right) \), where \( \widehat{EQ5D} \) is the predicted (mapped) EQ-5D at individual-level.

HRQoL and standardised pain estimates were obtained at baseline and at the follow-up period closest to 3 months following the start of treatment, as 3 months is the typical end of treatment measurement, though not necessarily the trial’s primary end-point. Changes from baseline were obtained by calculating the difference between values for these two time points. Missing data in the ATC dataset (9.3 % (n = 1,622) and 15.5 % (n = 2,716) of the total number of patients in the standardised pain and HRQoL outcome, respectively) was assumed to be missing at random (MAR) and a complete-case analysis was conducted.

Additional file 1: Table A1 presents the standardised pain outcome and (mapped/predicted) EQ-5D data. For both outcome measures baseline imbalance can be observed in some trials. The source of this imbalance is not clear, but should be addressed in the synthesis [76, 77].


Statistical models for the data

All analyses were conducted using Bayesian methods. A contrast-based modelling approach is taken throughout featuring relative treatment effects, in line with the parameterisation used by Lu and Ades [78], Saramago et al. [10] and others. A one-step modelling approach, where the likelihood for data at the IPD level and that of parameter estimates were described simultaneously, was preferred because we intended to explore treatment-by-covariate interactions at the patient-level [7, 79]. Note that all four models described below include pain type interactions which are specific to the current case-study. Table 2 summarises the key characteristics of the four models implemented, highlighting existing differences across these.

Table 2 Summary of key characteristics of implemented models

ANCOVA analysis (model 1)

The main modelling approach considered (model 1) is a variation of the ANCOVA approach that models the change score adjusting for baseline outcome values and with no stratification variables [19, 22, 80, 81] – such an approach is seen as equivalent to the existing ANCOVA approach.

The model considers a set of J studies for which IPD was available. The set of treatments included in these trials are labelled [A,B,C], where A is the reference treatment and there are K (=3) treatments in total. At baseline, patient i in study j allocated to treatment k provides a baseline measurement Yijk0 (where 0 indicates time t at baseline). Each patient provides a follow-up measurement (the assessment closest to 3 months), Yijk3. The change from baseline (Yijk3 − Yijk0) is denoted ΔY ijk and is assumed normally distributed with mean θ ijk and study-level variance of V j .

θ ijk , is assumed to be a function of μ jb , the outcome for treatment b (the lowest indexed treatment in each study) in study j for a patient with a baseline utility of 0, Yijk0; δ jbk , the study-specific treatment effect for treatment k relative to treatment b; and X jp , p - 1 dummy variables representing pain type p in the jth study. The latter terms were included to allow treatment effects to vary according to pain type (i.e. OAK; headache - including TTH and migraine; and musculoskeletal conditions - including lower back, shoulder and neck pain). There are different ways in which interaction effects can be specified in NMAs [82]. For this example we assumed that pain treatment interaction effects, β bkp , were different for each treatment but exchangeable across treatments. Estimates of β bkp were therefore assumed to be drawn from a random distribution with a common mean (B p ) and between treatment variance (σ 2 Bp ). An exchangeable interaction approach for pain was thought to be the most appropriate as it allowed pain interactions to be different across treatments but related. Pain interaction effects were not included for OAK as this is used as the reference pain indication. Pain interaction terms were specific to the current application and may be excluded if not of interest. However, we emphasise that adjustment for baseline should always be included regardless of the need to model interactions.

A random treatment effect approach was taken due to the expected between-study heterogeneity, the variance of which is described as σ2.

This model can be written as:

$$ \begin{array}{l}\Delta {Y}_{ijk}\sim N\left({\theta}_{ijk},{V}_j\right)\\ {}{\theta}_{ijk}=\left\{\begin{array}{cc}\hfill {\mu}_{jb}+{\beta}_{0j}{Y}_{ijk0}\hfill & \hfill if\kern0.5em k=b;\kern0.5em b\in \left\{A,\kern0.5em B,\kern0.5em C,\dots \right\}\hfill \\ {}\hfill {\mu}_{jb}+{\beta}_{0j}{Y}_{ijk0}+{\delta}_{jbk}+{\beta}_{bkp}{X}_{jp}\hfill & \hfill if\kern0.5em k>b\hfill \end{array}\right.\\ {}{\delta}_{jbk}\sim N\left({d}_{bk},{\sigma}^2\right)\sim N\left({d}_{Ak}-{d}_{Ab},{\sigma}^2\right)\\ {}{\beta}_{bkp}={\beta}_{Akp}-{\beta}_{Abp}\\ {}{\beta}_{Akp}\sim N\left({B}_p,{\sigma}_{Bp}^2\right)\\ {}{d}_{AA},{\beta}_{AAp}=0\end{array} $$

Prior distributions were defined independently as follows: 1/V j Gamma (0.001, 0.001); μ jb N(0, 106); β oj  ~ N(0, 106); d Ak N(0, 106); σUnif(0, 2); B p N(0, 106); σ Bp Unif(0, 2). Correlations in the random effects from trials with three or more arms were accounted for using published methodology [3, 64]. In this paper, k > b indicates that k is after b in the alphabet.

Controlling for treatment effect modifying patient-level characteristics (model 2)

For the EQ-5D endpoint, model 1 was extended to include patient-level covariates as potential treatment effect modifiers. Clinical expectations were that older age or higher body max index (BMI) may make patients more difficult to treat and, thus, potentially reduce the effect of treatment. Data on age were available from most studies and it was included as a covariate (centred) in the synthesis model. Again, a range of approaches can be used to incorporate treatment-effect interactions. In this analysis we assumed a common effect across pain types and for both acupuncture and sham acupuncture (i.e. a single interaction term is assumed to apply to all comparisons with usual care) [82] as this was deemed more clinically plausible. A non-linear effect of age was expected a priori, and thus squared terms were included for both main effects and treatment interaction effects. BMI data were only available in 10 of the 28 studies and for this reason we did not explore this variable further.

Model 2 thus differs from model 1 in that it considers the effects of the covariate Z (age). Differences to model 1 are shown below:

$$ \begin{array}{l}{\theta}_{ijk}=\left\{\begin{array}{cc}\hfill {\mu}_{jb}+{\beta}_{0j}{Y}_{ijk0}+{\phi}_0{Z}_{ijk}+{\varphi}_0{Z}_{ijk}^2\hfill & \hfill if\ k=b;\ b\in \left\{A,B,C,\dots \right\}\hfill \\ {}\hfill {\mu}_{jb}+{\beta}_{0j}{Y}_{ijk0}+{\phi}_0{Z}_{ijk}+{\varphi}_0{Z}_{ijk}^2+{\delta}_{jbk}+{\beta}_{bkp}{X}_{jp}\hfill & \hfill if\ k>b\ \mathrm{and}\ b\ne A\hfill \\ {}\hfill {\mu}_{jb}+{\beta}_{0j}{Y}_{ijk0}+{\phi}_0{Z}_{ijk}+{\varphi}_0{Z}_{ijk}^2+\phi {Z}_{ijk}+\varphi {Z}_{ijk}^2+{\delta}_{jbk}+{\beta}_{bkp}{X}_{jp}\hfill & \hfill if\ k>b\ \mathrm{and}\ b=A\hfill \end{array}\right.\\ {}{Z}_{ijk}\sim N\left(m,{\sigma}_Z^2\right)\end{array} $$

Coefficients on the main covariate effect and the effect squared are represented by ϕ0 and φ0. Coefficients on the treatment-by-covariate interaction term and the interaction between treatment and the squared covariate term are represented by ϕ and φ. No interaction term for comparisons of k and b were included when b ≠ A because the common regression coefficient cancels out.

Due to the possibility of missing covariate information for some individuals in some studies, Z ijk was assumed to be a normally distributed random variable with mean m and variance σ 2 Z , common across all IPD studies. This represents a Markov chain Monte Carlo (MCMC) multiple imputation technique which generates independent draws of the missing data from its predictive distribution assuming MAR covariate data. Additional priors were required for this model: ϕ0, ϕ, φ0, φN(0, 106); mUnif(−50, 50), σ z  ~ Unif(0, 30)

Analysis with restricted evidence (model 3 and 4)

Although model 1 is the preferred choice, this model would not be feasible in the absence of outcome information at the individual-level for both baseline and follow-up time points. Sub-optimal models which do not rely on the availability of IPD were therefore run for comparison purposes. Three options are typically available to the analyst when only AD are available [22] – i) in the event of ANCOVA estimates being available, synthesise these using published literature [22]; or ii) model the change score without baseline adjustment (model 3); or iii) model the final outcome score without baseline adjustment (model 4). We note that – though suboptimal - model 3 has also been presented in the context of an NMA of continuous outcomes when IPD were available [15, 16].

Models 3 and 4 are simplifications of model 1 where the baseline outcome variable is omitted and where model 4 considers the final score, rather than the change score, as dependent variable. The synthesis of data using models that ignore baseline outcomes may provide biased treatment effect estimates because of potential baseline imbalances (unless addressed within trials themselves) and due to ignoring potential correlation between the change/final score and the baseline value [77, 83]. It may also reduce the precision of treatment effect estimates, even if balance at baseline is observed across all synthesised evidence [22].

Calculating the residual deviance

The total residual deviance, TRD – a measure of model fit - can be estimated for each of the described models by summing study-level residual deviances, RD. Study-level RDs are the ratio of the sum across studies of the squared differences between the observed changes from baseline, ΔY ijk , and the estimated mean, θ ijk , divided by the study-level variance, V j [84]:

$$ \begin{array}{l}{D}_{ijk}={\left(\varDelta {Y}_{ijk}-{\theta}_{ijk}\right)}^2\\ {}R{D}_j= sum\left({D}_j\right)/{V}_j\\ {}TRD= sum(RD)\end{array} $$

For a model that fits the data well, it is assumed that the contributions to the RD to have a chi-squared distribution with N degrees of freedom if a sum over N unconstrained data points is made. On this basis, it is expected that the posterior mean of the TRD should be close to the number of unconstrained data points if the model predictions are a good fit to the data [20, 84, 85].

Model selection and implementation

Data management was performed in the freely available software package R version 3.0.0 (Copyright © 2013 The R Foundation for Statistical Computing [86]). The NMA analyses were undertaken in WinBUGs [87] version 1.4.3 (Copyright © 2008 Medical Research Council (UK) and Imperial College (UK)), linked to the R software through the packages R2WinBUGS [88] and CodaPkg [89]. Annotated code, sample data and initial values for model 1 are provided in the Additional file 2 to allow readers to adapt it for their own purposes.

In all models the MCMC Gibbs sampler was initially run for 10,000 iterations and these were discarded as ‘burn-in’. Models were run for a further 5,000 iterations, on which inferences were based. Chain convergence was checked using autocorrelation and Brooks-Gelman-Rubin diagram [90, 91] diagnostics. Goodness of fit was assessed using the deviance information criterion (DIC) and TRD [84]. Results are presented as EQ-5D preference scores and SMD treatment effect estimates (and associated 95 % credibility intervals, CrIs) and also using the probability of treatment being the ‘best’ treatment in terms of being the most clinically effective [4].


ANCOVA analysis results (model 1)

Table 2 and Fig. 2 show the evidence from model 1 on relative treatment effect estimates adjusted for baseline and treatment-by-pain interaction effects (medians of the MCMC posterior samples and 95 % CrI shown). Measures of model fit (TRD and DIC) are also shown. The reference category for the pain interaction effects is the OAK pain type.

Fig. 2
figure 2

Forest plot showing network meta-analysis results for standardised pain and EQ-5D outcomes

For both endpoints, model 1 indicates that acupuncture treatment increases the HRQoL of patients and/or reduces pain more than usual care and sham acupuncture treatments, irrespective of pain group. For the EQ-5D endpoint the treatment effect of acupuncture vs. usual care in the OAK population is 0.079 (median, 95 % CrI: 0.042 to 0.114), for headache/migraine and musculoskeletal pain patients the comparable treatment effects are 0.056 (median, 95 % CrI: 0.021 to 0.092) and 0.082 (median, 95 % CrI: 0.047 to 0.116), respectively. The results also favour acupuncture over sham acupuncture, although with a greater degree of uncertainty, as reflected by the fact that CrIs include zero for all pain types (OAK: 0.022, 95 % CrI −0.014 to 0.060; headache/migraine: 0.004, 95 % CrI −0.035 to 0.042; and musculoskeletal 0.023, 95 % CrI −0.008 to 0.053). The probability that acupuncture is the best treatment at improving HRQoL is 0.89 for OAK, 0.64 for headache/migraine and 0.95 for musculoskeletal pain.

Results for the SMD endpoint followed a similar pattern. However, in contrast to the EQ-5D analysis, in the latter comparison the CrIs do not include zero in the standardised pain analysis for OAK (0.438, 95 % CrI 0.121 to 0.715) and musculoskeletal (0.527, 95 % CrI 0.323 to 0.735) pain types, though they do for headache/migraine (0.256, 95 % CrI −0.073 to 0.560). The probability that acupuncture is the best treatment at improving standardised pain is 0.96 to 1.00 depending on pain type.

It is interesting to note that sham acupuncture vs. usual care treatment effect 95 % CrIs across pain types do not include 0 in the EQ-5D endpoint but they do for SMD, except for the headache/migraine group. These results suggest that sham acupuncture effects may well go beyond pain. Also interesting is the estimated magnitude of the uncertainty over the pain type interactions (not reported) as these, particularly for the EQ-5D endpoint, do not provide strong evidence of a difference between pain types.

Expectations were that some level of heterogeneity existed between-trials. Possibly as a consequence of the mapping work performed, this expectation was not fulfilled for the EQ-5D endpoint (the between-study variance estimate is 0.001). For the SMD endpoint the between study variance was also small relative to the magnitude of the treatment effects (the between-study variance estimate is 0.09). The TRD suggests that the models provide an adequate fit to the data (see Table 3).

Table 3 IPD NMA ANCOVA synthesis model results (model 1), EQ-5D preference score and standardised pain endpoints

Controlling for patient-level characteristics (model 2)

Table 1 provides information on age for each of the trials included in the dataset. The average age was lower in the headache/migraine pain group than in the musculoskeletal group, which in turn was lower than the OAK group.

Using the change in EQ-5D as the outcome for synthesis, Table 4 presents the results of applying model 2 (an extension of model 1) to include patient-level information on age – with age considered as a potential treatment effect modifier. The model fit statistics show that the adjusted by age model is marginally better than model 1, providing lower DIC statistics and reduced posterior RD. The results are very similar to model 1 and do not suggest age is a strong effect modifier or that non-linear effects of age on the effect of treatments are present.

Table 4 IPD NMA ANCOVA synthesis model (model 2) results with adjustments, EQ-5D preference endpoint

Analysis with restricted evidence (model 3 and 4)

Models 3 and 4 model the change score and the final outcome score, respectively. These are seen as simplifications of model 1 where no baseline adjustment is done. Results for models 3 and 4 are presented in Table 5, together with model 1 results for comparison. Generally, all three models convey the same message in relation to which treatment provides higher increases in patients’ HRQoL; that is, acupuncture is found to be better than sham and usual care treatments. As expected, models 3 and 4 (model 3 in particular) provide different summary results of treatment effects when compared to model 1. Compared with the ANCOVA model (model 1), model 3, the change score approach, generally inflates the summary treatment effects across pain types, with potential losses in precision (e.g. for OAK the median EQ-5D treatment effect is inflated 19 % in model 3 compared to model 1 for the acupuncture vs usual care comparison). Compared to model 1, model 4 summary treatment effects are generally similar or lower; CrIs are however consistently wider in model 4 compared to model 1.

Table 5 IPD NMA results for models (1), (3) and (4), EQ-5D preference score endpoint


This study presents methods for conducting NMA of IPD on continuous outcomes, building on previous work on ANCOVA models for pairwise meta-analysis [22]. IPD availability avoided the use of non-baseline-adjusted models, allowing for ANCOVA models to be applied, thus improving precision of treatment effect estimates while adjusting for baseline imbalance [22]. Our results generalise the findings from Riley et al. [22] to the NMA setting and reinforce the idea that different approaches to the synthesis of continuous outcomes will produce different results. The ANCOVA approach is advocated to be the most appropriate modelling approach. Due to limited reporting of ANCOVA results in trial publications, IPD will typically be required to facilitate implementation of the ANCOVA NMA approach. The appropriate analysis of continuous endpoints therefore provides a further rationale for obtaining access to IPD, in addition to those well documented in the NMA literature [10, 12, 15, 92].

Recent work by Hong et al. [15] and Thom et al. [16] presented and discussed IPD NMA models for continuous outcomes. While Hong and colleagues [15] introduced contrast-based and arm-based models for multiple outcomes, Thom et al. [16] synthesised AD and IPD, some of which was observational rather than RCT data. They also considered interactions between treatment effects and covariates. The existence of ecological bias was explored in Hong et al. [15] by partitioning within- and across-study interactions [10]. Both publications used the change from baseline as their continuous outcome measure. In both publications models were presented that did not incorporate an adjustment for baseline outcome values, and in Hong et al. [15] adjustment for baseline outcome values was only considered in the context of modelling baseline outcomes as a treatment effect modifier. Thom et al. [16] recognised that the approach taken was not the recommended one, but noted that an ANCOVA-type approach was not possible as, for most studies in their motivating example, only AD was accessible to them. Our work emphasises that where IPD is available, all models of continuous outcomes should include adjustment for the baseline outcome, and unadjusted models should not be presented.

Analyses in this paper were conducted to explore the implications of using non-ANCOVA models in a NMA framework, as other methods have been used in the literature [15] to analyse continuous outcome IPD, and these methods are often necessary in the absence of IPD. The results showed some differences with the ANCOVA results. Modelling final scores or change scores without baseline adjustment produced estimates of treatment effect which differed by up to 19 % compared to the baseline adjusted model. By explicitly accounting for correlation between the change score and the baseline score in the presence of baseline imbalance, the tested ANCOVA model (model 1) avoids bias in the pooled treatment effect estimates. These results emphasise how important it is to adjust for baseline to adequately synthesise evidence in this setting; tasks very much facilitated with the availability of IPD. We hope that by highlighting the consequence of using suboptimal model(s) may encourage readers to obtain IPD so that the most appropriate methods may be implemented. When IPD is available ANCOVA should always be used. There has been a discussion in the literature about the fact that final or change score analyses may ‘bound’ the true relative effect estimate. Although this may be true for a single trial, it may not hold for NMA models [18]. This emphasises the importance of conducting appropriate analyses as the potential direction of bias is difficult to predict. Any bias in treatment effect or impact on precision could lead to inappropriate decisions regarding adoption and further research.

The motivating example related to the effectiveness of acupuncture for the treatment of chronic pain. The analyses found acupuncture to be more effective than usual care with respect to reducing pain and improving EQ-5D preference scores in patients with chronic pain of OAK, musculoskeletal and headache/migraine origin. The benefits of acupuncture over sham acupuncture are smaller than when compared to usual care. The methods used provided outputs in a format that can be used to directly inform CE considerations once the full set of relevant comparators are considered.

A recent study by Vickers et al. [35] also explored the effectiveness of acupuncture for chronic pain. This study performed an IPD pair-wise meta-analysis using the same data plus data from an additional trial [93] – data which, due to lack of consent, was not available to be used in the current analysis. Using study-specific primary outcome measures and the ANCOVA methodology, the Vickers et al. [35] study conducted meta-analyses separately for comparisons of acupuncture with sham acupuncture and usual care, and within each pain type. Despite the methodological differences, and differences for some trials in choice of primary outcome measure and/or primary end point, the authors’ findings are similar.

The instruments used to measure health outcomes differed between trials. Standardisation and mapping approaches were used to derive, pain-related outcomes and EQ-5D, respectively. Analysis of the pain outcome required development of methods for conducting standardised mean difference analysis with IPD. Analysis of the EQ-5D data required an extensive mapping exercise whereby separate mapping functions were applied to each study, with choice of mapping dependent on the available outcome data. Access to IPD in this context also avoided the use of any assumptions regarding the distribution of HRQoL instrument scores – thus allowing the observed distributions to be adequately reflected in the mapped EQ-5D estimates.

This study has a number of limitations. The applicability of these methods is conditional on the access to IPD. If IPD is not available or is partially available, other methods need to be used and limitations stressed. Often a mixture of IPD and AD is available – anecdotally a 50 % success rate of obtaining IPD is attained in the academic world, lower success rates may be achieved elsewhere, where it is common to have, for instance, only a company’s own RCT data and not that for competitor interventions. In the context of continuous outcomes the advantages of access to IPD are significant and efforts to share data should be pursued. As access to IPD for all studies in all NMAs is likely to be unrealistic in the medium-term, it would be useful to have available a methodology which had the advantages of the ANCOVA approach but could be used when only some (or even no) studies in the database were available in IPD form.

Additionally there are a series of limitations related to the case study. Firstly, the synthesis of heterogeneous outcomes relied on imperfect standardisation processes (which assume that any differences in within trial outcome variability are due to the use of different instruments) and mappings which are typically able to explain only a minority of variation in EQ-5D. The availability of key outcomes across trials would have reduced these concerns, as would the collection of generic preference based measures of HRQoL in all trials. Also, the outcome data closest to 3 months were selected for synthesis. For some trials, the nearest reported outcome data were at only months 1 or 2. If the effect of acupuncture increases gradually, these effects may underestimate 3 month outcomes. Furthermore, some of the trials show increased benefits of acupuncture over comparators at 12 [94] and 24 months [95] compared to 3 months. This evidence may be an indication of the long-term clinical benefits of acupuncture and has implications for estimating long-term HRQoL and CE. Collection of trial data for more than 3 months is therefore warranted together with further work analysing repeated outcome measurements in a NMA to evaluate the importance of these effects.

A complete-case analysis was conducted. This approach to missing data has been thoroughly documented in the methods literature as not being optimal as it can lead to bias if observations with missing values systematically differ from the complete cases and may inflate standard errors due to the reduction in sample size. Some recent work has been done in this area [96], although it does not consider the case where IPD is available. Finally, another potential issue for future exploration is that the impact of each pain condition on treatment effects was assumed to be exchangeable [82]; this assumption could be explored further by comparing different assumptions over the inclusion of the interaction effects, or even with the inclusion of no interaction effects. In summary, a worthwhile extension to this work would be to develop a multivariate ANCOVA modelling framework considering both multiple endpoints and time points, missing data and which enables relevant aggregate data to be included, building on recent work [15, 97101].


In conclusion, this paper has reiterated the importance of accessing and analysing IPD and presented methods to fully exploit the benefits of access to this data in the context of continuous outcomes. Methods for conducting ANCOVA IPD NMA of continuous outcomes are presented and discussed. The methods developed are applicable to contexts in which endpoints are reposted consistently and to contexts in which outcome measures differ across trials. Given the demonstrable benefits of access to IPD, we suggest that more effort should be made to share and develop repositories for data in this format [102].


  1. Considering s t tx as the standardised value of the pain measurement p made at the time point t in patients under treatment tx, it can be demonstrated that (S t1tx1  − S t0tx1 ) − (S t1tx0  − S t0tx0 ) = (S t1tx1  − S t1tx0 ) − (S t0tx1  − S t0tx0 ) = ΔSMD

  2. The selection of the outcome to be mapped was not at random. Preference was given to generic preference-based instruments (i.e. SF-12 and SF-36) and, in its absence, to condition-specific ones (i.e. WOMAC, VAS pain and CMS), conditional on the existence of a valid and published algorithm. WOMAC was used in preference to VAS pain and CMS as it covers a broader definition of HRQoL.

  3. A random effects generalised least squares algorithm considering dimensions, dimensions squared and interactions from Rowen et al. [71] was used. A multinomial logit using PCS and MCS summary scores, summary scores squared and interaction terms from Gray et al. [70] was used.

  4. An OLS including total WOMAC score, total WOMAC squared, age and gender as covariates from Barton et al. [73] was used. An OLS including VAS pain and VAS pain squared as covariates from Maund et al. [72] was used.

  5. A mapping process involves additional sources of uncertainty - the uncertainty in the mapping function regression coefficients and the structure of the mapping model. These additional sources of uncertainty are not accounted for in this analysis.



Aggregate data


Analysis of covariance


Acupuncture Trialists’ Collaboration


Body max index




Constant Murley Score


Credibility interval


Deviance information criterion


EuroQol five-dimensional


Health-related quality of life


Individual patient data


Missing at random


Markov chain Monte Carlo


Mental component summary


National Institute for Health and Care Excellence


National Institute for Health Research


Network meta-analysis


Osteoarthritis of the knee


Ordinary least squares


Physical component summary


Quality-adjusted life-year


Residual deviance


Short Form 12 dimensions


Short Form 36 dimensions


Standardised mean difference


Total residual deviance


Tension-type headache


United Kingdom


Visual analogue scale


Western Ontario and McMaster Universities Arthritis Index


  1. Lu G, Ades A, Sutton A, Cooper N, Briggs A, Caldwell D. Meta-analysis of mixed treatment comparisons at multiple follow-up times. Stat Med. 2007;26(20):3681–99.

    Article  CAS  PubMed  Google Scholar 

  2. Lumley T. Network meta-analysis for indirect treatment comparisons. Stat Med. 2002;21(16):2313–24.

    Article  PubMed  Google Scholar 

  3. Lu G, Ades A. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med. 2004;23(20):3105–24.

    Article  CAS  PubMed  Google Scholar 

  4. Ades A, Sculpher M, Sutton A, Abrams K, Cooper N, Welton N, Lu G. Bayesian methods for evidence synthesis in cost-effectiveness analysis. Pharmacoeconomics. 2006;24(1):1–19.

    Article  CAS  PubMed  Google Scholar 

  5. Riley RD, Simmonds MC, Look MP. Evidence synthesis combining individual patient data and aggregate data: a systematic review identified current practice and possible methods. J Clin Epidemiol. 2007;60(5):431–9.

    Article  PubMed  Google Scholar 

  6. Riley RD, Dodd SR, Craig JV, Thompson JR, Williamson PR. Meta-analysis of diagnostic test studies using individual patient data and aggregate data. Stat Med. 2008;27(29):6111–36.

    Article  PubMed  Google Scholar 

  7. Riley RD, Lambert PC, Staessen JA, Wang J, Gueyffier F, Thijs L, Boutitie F. Meta-analysis of continuous outcomes combining individual patient data and aggregate data. Stat Med. 2008;27(11):1870–93.

    Article  PubMed  Google Scholar 

  8. Sutton A, Kendrick D, Coupland C. Meta-analysis of individual- and aggregate-level data. Stat Med. 2008;27(5):651–69.

    Article  CAS  PubMed  Google Scholar 

  9. Riley RD, Lambert PC, Abo-Zaid G. Meta-analysis of individual participant data: rationale, conduct, and reporting. Brit Med J. 2010;340:c221.

    Article  PubMed  Google Scholar 

  10. Saramago P, Sutton A, Cooper N, Manca A. Mixed treatment comparisons using aggregate and individual participant level data. Stat Med. 2012;31(28):3516–36.

    Article  PubMed  Google Scholar 

  11. Dias S, Welton N, Caldwell D, Ades A. Checking consistency in mixed treatment comparison meta-analysis. Stat Med. 2010;29(7–8):932–44.

    Article  CAS  PubMed  Google Scholar 

  12. Donegan S, Williamson P, D'Alessandro U, Smith CT. Assessing the consistency assumption by exploring treatment by covariate interactions in mixed treatment comparison meta-analysis: individual patient-level covariates versus aggregate trial-level covariates. Stat Med. 2012;31(29):3840–57.

    Article  PubMed  Google Scholar 

  13. Saramago P, Manca A, Sutton A. Deriving input parameters for cost-effectiveness modeling: Taxonomy of data types and approaches to their statistical synthesis. Value Health. 2012;15(5):639–49.

    Article  PubMed  Google Scholar 

  14. Saramago P, Chuang LH, Soares MO. Network meta-analysis of (individual patient) time to event data alongside (aggregate) count data. BMC Med Res Methodol. 2014;14:105.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Hong H, Fu H, Price KL, Carlin BP. Incorporation of individual-patient data in network meta-analysis for multiple continuous endpoints, with application to diabetes treatment. Stat Med. 2015;34(20):2794–819.

    Article  PubMed  Google Scholar 

  16. Thom HH, Capkun G, Cerulli A, Nixon RM, Howard LS. Network meta-analysis combining individual patient and aggregate data from a mixture of study designs with an application to pulmonary arterial hypertension. BMC Med Res Methodol. 2015;15:34.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Deeks JJ, Higgins JPT, Altman DG. Chapter 9: analysing data and undertaking meta-analyses. In: Higgins JPT, Green S, editors. Cochrane handbook for systematic reviews of interventions version 5.0.1 [updated September 2008]. The Cochrane Collaboration; 2008. Available from

  18. Fu R, Vandermeer BW, Shamliyan TA, O’Neil ME, Yazdi F, Fox SH, Morton SC. Handling Continuous Outcomes. In: Quantitative Synthesis. Methods Guide for Comparative Effectiveness Reviews. (Prepared by the Oregon Evidence-based Practice Center under Contract No. 290-2007-10057-I.) AHRQ Publication No. 13-EHC103-EF. Rockville, MD: Agency for Healthcare Research and Quality. July 2013. Available from

  19. Vickers A, Altman D. Analysing controlled trials with baseline and follow up measurements. Brit Med J. 2001;323(7321):1123–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Dias S, Welton NJ, Sutton AJ, Ades AE. NICE DSU Technical Support Document 2: A generalised linear modelling framework for pairwise and network meta-analysis of randomised controlled trials. NICE; 2011. Available from

  21. Higgins J, Green S. Cochrane handbook for systematic reviews of interventions, Version 5.1.0 [updated March 2011]. edn. Oxford: Wiley-Blackwell; 2011.

    Google Scholar 

  22. Riley R, Kauser I, Bland M, Thijs L, Staessen J, Wang J, Gueyffier F, Deeks J. Meta-analysis of randomised trials with a continuous outcome according to baseline imbalance and availability of individual participant data. Stat Med. 2013;32(16):2747–66.

    Article  PubMed  Google Scholar 

  23. Scott D, Boye K, Timlin L, Clark J, Best J. A network meta-analysis to compare glycaemic control in patients with type 2 diabetes treated with exenatide once weekly or liraglutide once daily in comparison with insulin glargine, exenatide twice daily or placebo. Diabetes Obes Metab. 2013;15(3):213–23.

    Article  CAS  PubMed  Google Scholar 

  24. Cooper N, Sutton A, Lu G, Khunti K. Mixed comparison of stroke prevention treatments in individuals with nonrheumatic atrial fibrillation. Arch Intern Med. 2006;166(12):1269–75.

    Article  PubMed  Google Scholar 

  25. Nikolakopoulou A, Chaimani A, Veroniki A, Vasiliadis H, Schmid C, Salanti G. Characteristics of networks of interventions: a description of a database of 186 published networks. Plos One. 2014;9(1):e86754.

    Article  PubMed  PubMed Central  Google Scholar 

  26. NICE. NICE guideline on low back pain: early management of persistent non-specific low back pain. 2009.

    Google Scholar 

  27. NICE. NICE guideline on diagnosis and management of headaches in young people and adults. 2012.

    Google Scholar 

  28. NICE. NICE guideline on osteoarthritis: the care and management of osteoarthritis in adults. 2014.

    Google Scholar 

  29. Latimer N. NICE guideline on osteoarthritis: is it fair to acupuncture? Yes. Acupunct Med. 2009;27(2):72–5.

    Article  PubMed  Google Scholar 

  30. Latimer NR, Bhanu AC, Whitehurst DG. Inconsistencies in NICE guidance for acupuncture: reanalysis and discussion. Acupunct Med. 2012;30(3):182–6.

    Article  PubMed  Google Scholar 

  31. Cummings M. Why recommend acupuncture for low back pain but not for osteoarthritis? A commentary on recent NICE guidelines. Acupunct Med. 2009;27(3):128–9.

    Article  PubMed  Google Scholar 

  32. White A. NICE guideline on osteoarthritis: is it fair to acupuncture? No. Acupunct Med. 2009;27(2):70–2.

    Article  PubMed  Google Scholar 

  33. MacPherson H, Vickers A, Bland B, Torgerson D, Corbett M, Spackman E, Saramago P, Woods B, Sculpher M, Manca A, et al. Acupuncture for chronic pain and depression in primary care: a programme of research. NIHR J. 2016. (in press).

  34. Vickers AJ, Cronin AM, Maschino AC, Lewith G, Macpherson H, Victor N, Sherman KJ, Witt C, Linde K. Individual patient data meta-analysis of acupuncture for chronic pain: protocol of the Acupuncture Trialists’ Collaboration. Trials. 2010;11:90.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Vickers AJ, Cronin AM, Maschino AC, Lewith G, MacPherson H, Foster NE, Sherman KJ, Witt CM, Linde K. Acupuncture for chronic pain: individual patient data meta-analysis. Arch Intern Med. 2012;172(19):1444–53.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Berman BM, Lao L, Langenberg P, Lee WL, Gilpin AM, Hochberg MC. Effectiveness of acupuncture as adjunctive therapy in osteoarthritis of the knee: a randomized, controlled trial. Ann Intern Med. 2004;141(12):901–10.

    Article  PubMed  Google Scholar 

  37. Vas J, Mendez C, Perea-Milla E, Vega E, Panadero MD, Leon JM, Borge MA, Gaspar O, Sanchez-Rodriguez F, Aguilar I, et al. Acupuncture as a complementary therapy to the pharmacological treatment of osteoarthritis of the knee: randomised controlled trial. BMJ. 2004;329(7476):1216.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Witt C, Brinkhaus B, Jena S, Linde K, Streng A, Wagenpfeil S, Hummelsberger J, Walther HU, Melchart D, Willich SN. Acupuncture in patients with osteoarthritis of the knee: a randomised trial. Lancet. 2005;366(9480):136–43.

    Article  CAS  PubMed  Google Scholar 

  39. Scharf HP, Mansmann U, Streitberger K, Witte S, Kramer J, Maier C, Trampisch HJ, Victor N. Acupuncture and knee osteoarthritis: a three-armed randomized trial. Ann Intern Med. 2006;145(1):12–20.

    Article  PubMed  Google Scholar 

  40. Witt CM, Jena S, Brinkhaus B, Liecker B, Wegscheider K, Willich SN. Acupuncture in patients with osteoarthritis of the knee or hip: a randomized, controlled trial with an additional nonrandomized arm. Arthritis Rheum. 2006;54(11):3485–93.

    Article  PubMed  Google Scholar 

  41. Foster NE, Thomas E, Barlas P, Hill JC, Young J, Mason E, Hay EM. Acupuncture as an adjunct to exercise based physiotherapy for osteoarthritis of the knee: randomised controlled trial. BMJ. 2007;335(7617):436.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Williamson L, Wyatt MR, Yein K, Melton JT. Severe knee osteoarthritis: a randomized controlled trial of acupuncture, physiotherapy (supervised exercise) and standard management for patients awaiting knee replacement. Rheumatology (Oxford). 2007;46(9):1445–9.

    Article  CAS  Google Scholar 

  43. Vickers AJ, Rees RW, Zollman CE, McCarney R, Smith CM, Ellis N, Fisher P, Van Haselen R, Wonderling D, Grieve R. Acupuncture of chronic headache disorders in primary care: randomised controlled trial and economic analysis. Health Technol Assess. 2004;8(48):iii. 1–35.

    Article  CAS  PubMed  Google Scholar 

  44. Linde K, Streng A, Jurgens S, Hoppe A, Brinkhaus B, Witt C, Wagenpfeil S, Pfaffenrath V, Hammes MG, Weidenhammer W, et al. Acupuncture for patients with migraine: a randomized controlled trial. JAMA. 2005;293(17):2118–25.

    Article  CAS  PubMed  Google Scholar 

  45. Melchart D, Streng A, Hoppe A, Brinkhaus B, Witt C, Wagenpfeil S, Pfaffenrath V, Hammes M, Hummelsberger J, Irnich D, et al. Acupuncture in patients with tension-type headache: randomised controlled trial. BMJ. 2005;331(7513):376–82.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Diener HC, Kronfeld K, Boewing G, Lungenhausen M, Maier C, Molsberger A, Tegenthoff M, Trampisch HJ, Zenz M, Meinert R. Efficacy of acupuncture for the prophylaxis of migraine: a multicentre randomised controlled clinical trial. Lancet Neurol. 2006;5(4):310–6.

    Article  PubMed  Google Scholar 

  47. Endres HG, Bowing G, Diener HC, Lange S, Maier C, Molsberger A, Zenz M, Vickers AJ, Tegenthoff M. Acupuncture for tension-type headache: a multicentre, sham-controlled, patient-and observer-blinded, randomised trial. J Headache Pain. 2007;8(5):306–14.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Jena S, Witt CM, Brinkhaus B, Wegscheider K, Willich SN. Acupuncture in patients with headache. Cephalalgia. 2008;28(9):969–79.

    Article  CAS  PubMed  Google Scholar 

  49. Kleinhenz J, Streitberger K, Windeler J, Gussbacher A, Mavridis G, Martin E. Randomised clinical trial comparing the effects of acupuncture and a newly designed placebo needle in rotator cuff tendinitis. Pain. 1999;83(2):235–41.

    Article  CAS  PubMed  Google Scholar 

  50. Carlsson CP, Sjolund BH. Acupuncture for chronic low back pain: a randomized placebo-controlled study with long-term follow-up. Clin J Pain. 2001;17(4):296–305.

    Article  CAS  PubMed  Google Scholar 

  51. Irnich D, Behrens N, Molzen H, Konig A, Gleditsch J, Krauss M, Natalis M, Senn E, Beyer A, Schops P. Randomised trial of acupuncture compared with conventional massage and “sham” laser acupuncture for treatment of chronic neck pain. BMJ. 2001;322(7302):1574–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Kerr DP, Walsh DM, Baxter D. Acupuncture in the management of chronic low back pain: a blinded randomized controlled trial. Clin J Pain. 2003;19(6):364–70.

    Article  PubMed  Google Scholar 

  53. de Hoyos JA G, Andres Martin Mdel C, Bassas y Baena de Leon E, Vigara Lopez M, Molina Lopez T, Verdugo Morilla FA, Gonzalez Moreno MJ. Randomised trial of long term effect of acupuncture for shoulder pain. Pain. 2004;112(3):289–98.

    Article  Google Scholar 

  54. White P, Lewith G, Prescott P, Conway J. Acupuncture versus placebo for the treatment of chronic mechanical neck pain: a randomized, controlled trial. Ann Intern Med. 2004;141(12):911–9.

    Article  PubMed  Google Scholar 

  55. Brinkhaus B, Witt CM, Jena S, Linde K, Streng A, Wagenpfeil S, Irnich D, Walther HU, Melchart D, Willich SN. Acupuncture in patients with chronic low back pain: a randomized controlled trial. Arch Intern Med. 2006;166(4):450–7.

    PubMed  Google Scholar 

  56. Salter GC, Roman M, Bland MJ, MacPherson H. Acupuncture for chronic neck pain: a pilot for a randomised controlled trial. BMC Musculoskelet Disord. 2006;7:99.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Vas J, Perea-Milla E, Mendez C, Sanchez Navarro C, Leon Rubio JM, Brioso M, Garcia Obrero I. Efficacy and safety of acupuncture for chronic uncomplicated neck pain: a randomised controlled study. Pain. 2006;126(1–3):245–55.

    Article  PubMed  Google Scholar 

  58. Thomas KJ, MacPherson H, Ratcliffe J, Thorpe L, Brazier J, Campbell M, Fitter M, Roman M, Walters S, Nicholl JP. Longer term clinical and economic benefits of offering acupuncture care to patients with chronic low back pain. Health Technol Assess. 2005;9(32):iii−+.

    Article  Google Scholar 

  59. Witt CM, Jena S, Brinkhaus B, Liecker B, Wegscheider K, Willich SN. Acupuncture for patients with chronic neck pain. Pain. 2006;125(1–2):98–106.

    Article  PubMed  Google Scholar 

  60. Witt CM, Jena S, Selim D, Brinkhaus B, Reinhold T, Wruck K, Liecker B, Linde K, Wegscheider K, Willich SN. Pragmatic randomized trial evaluating the clinical and economic effectiveness of acupuncture for chronic low back pain. Am J Epidemiol. 2006;164(5):487–96.

    Article  PubMed  Google Scholar 

  61. Haake M, Muller HH, Schade-Brittinger C, Basler HD, Schafer H, Maier C, Endres HG, Trampisch HJ, Molsberger A. German Acupuncture Trials (GERAC) for chronic low back pain: randomized, multicenter, blinded, parallel-group trial with 3 groups. Arch Intern Med. 2007;167(17):1892–8.

    Article  PubMed  Google Scholar 

  62. Kennedy S, Baxter GD, Kerr DP, Bradbury I, Park J, McDonough SM. Acupuncture for acute non-specific low back pain: a pilot randomised non-penetrating sham controlled trial. Complement Ther Med. 2008;16(3):139–46.

    Article  CAS  PubMed  Google Scholar 

  63. Vas J, Ortega C, Olmo V, Perez-Fernandez F, Hernandez L, Medina I, Seminario JM, Herrera A, Luna F, Perea-Milla E, et al. Single-point acupuncture and physiotherapy for the treatment of painful shoulder: a multicentre randomized controlled trial. Rheumatology (Oxford). 2008;47(6):887–93.

    Article  CAS  Google Scholar 

  64. Dias S, Sutton AJ, Ades AE, Welton NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med Decis Making. 2013;33(5):607–17.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Takeshima N, Sozu T, Tajika A, Ogawa Y, Hayasaka Y, Furukawa TA. Which is more generalizable, powerful and interpretable in meta-analyses, mean difference or standardized mean difference? BMC Med Res Methodol. 2014;14:30.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Brooks R, De Charro F. EuroQol: The current state of play. Health Policy. 1996;37(1):53–72.

    Article  CAS  PubMed  Google Scholar 

  67. Williams A. Euroqol - a New Facility for the Measurement of Health-Related Quality-of-Life. Health Policy. 1990;16(3):199–208.

    Article  Google Scholar 

  68. NICE. Guide to the methods of technology appraisal. London: National Institute for Health and Clinical Excellence, Institute’s Decision Support Unit; 2013.

    Google Scholar 

  69. Kind P, Hardman G, Macran S. UK Population Norms for EQ-5D. In: Centre for Health Economics. 1999.

    Google Scholar 

  70. Gray A, Rivero-Arias O, Clarke P. Estimating the association between SF-12 responses and EQ-5D utility values by response mapping. Med Decis Mak. 2006;26(1):18–29.

    Article  Google Scholar 

  71. Rowen D, Brazier J, Roberts J. Mapping SF-36 onto the EQ-5D index: how reliable is the relationship? Health and Quality of Life Outcomes. 2009;7(1):1–9.

  72. Maund E, Craig D, Suekarran S, Neilson A, Wright K, Brealey S, Dennis L, Goodchild L, Hanchard N, Rangan A, et al. Management of frozen shoulder: a systematic review and cost-effectiveness analysis. Health Technol Assess. 2012;16(11):i-xvi+1-243.

    Article  Google Scholar 

  73. Barton G, Sach T, Jenkinson C, Avery A, Doherty M, Muir K. Do estimates of cost-utility based on the EQ-5D differ from those based on the mapping of utility scores? Health Qual Life Outcomes. 2008;6:51.

    Article  PubMed  PubMed Central  Google Scholar 

  74. Khan K. Mapping outcomes data to estimate health state utilities:An analysis of individual patient level data from 15 acupuncture trials. Department of Economics and Related Studies. UK: University of York, 2011.

  75. Chan KK, Willan AR, Gupta M, Pullenayegum E. Underestimation of uncertainties in health utilities derived from mapping algorithms involving health-related quality-of-life measures: statistical explanations and potential remedies. Med Decis Making. 2014;34(7):863–72.

  76. Trowman R, Dumville JC, Torgerson DJ, Cranny G. The impact of trial baseline imbalances should be considered in systematic reviews: a methodological case study. J Clin Epidemiol. 2007;60(12):1229–33.

    Article  PubMed  Google Scholar 

  77. Aiello F, Attanasio M, Tine F. Assessing covariate imbalance in meta-analysis studies. Stat Med. 2011;30(22):2671–82.

    Article  PubMed  Google Scholar 

  78. Lu GB, Ades AE. Assessing evidence inconsistency in mixed treatment comparisons. J Am Stat Assoc. 2006;101(474):447–59.

    Article  CAS  Google Scholar 

  79. Higgins J, Whitehead A, Turner R, Omar R, Thompson S. Meta-analysis of continuous outcome data from individual patients. Stat Med. 2001;20(15):2219–41.

    Article  CAS  PubMed  Google Scholar 

  80. Senn S. Change from baseline and analysis of covariance revisited. Stat Med. 2006;25(24):4334–44.

    Article  PubMed  Google Scholar 

  81. Van Breukelen GJ. ANCOVA versus change from baseline: more power in randomized studies, more bias in nonrandomized studies [corrected]. J Clin Epidemiol. 2006;59(9):920–5.

    Article  PubMed  Google Scholar 

  82. Cooper N, Sutton A, Morris D, Ades A, Welton N. Addressing between-study heterogeneity and inconsistency in mixed treatment comparisons: Application to stroke prevention treatments in individuals with non-rheumatic atrial fibrillation. Stat Med. 2009;28(14):1861–81.

    Article  PubMed  Google Scholar 

  83. Turner RM, Spiegelhalter DJ, Smith GC, Thompson SG. Bias modelling in evidence synthesis. J R Stat Soc Ser A Stat Soc. 2009;172(1):21–47.

    Article  PubMed  PubMed Central  Google Scholar 

  84. Spiegelhalter DJ, Best NG, Carlin BR, van der Linde A. Bayesian measures of model complexity and fit. J Roy Stat Soc B. 2002;64:583–616.

    Article  Google Scholar 

  85. Welton NJ, Sutton AJ, Cooper NJ, Abrams KR, Ades AE. Evidence synthesis for decision making in healthcare. First Edition, 2012. John Wiley & Sons, Ltd.

  86. Team RC. In: Computing RFfS, editor. R: a language and environment for statistical computing. Austria; 2014.

  87. Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS - a Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput. 2000;10(4):325–37.

    Article  Google Scholar 

  88. Sturtz S, Ligges U, Gelman A. R2WinBUGS: a package for running WinBUGS from R. J Stat Softw. 2005;12(3):1–16.

    Article  Google Scholar 

  89. Plummer M, Nicky B, Cowles K, Vines K. CODA: convergence diagnosis and output analysis for MCMC. R News. 2006;6(1):7.

    Google Scholar 

  90. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat Sci. 1992;7(4):457–72.

    Article  Google Scholar 

  91. Brooks SP, Gelman A. General methods for monitoring convergence of iterative simulations. J Comput Graph Stat. 1998;7(4):434–55.

    Google Scholar 

  92. Donegan S, Williamson P, D’Alessandro U, Garner P, Smith CT. Combining individual patient data and aggregate data in mixed treatment comparison meta-analysis: Individual patient data may be beneficial if only for a subset of trials. Stat Med. 2013;32(6):914–30.

    Article  PubMed  Google Scholar 

  93. Cherkin D, Eisenberg D, Sherman K, Barlow W, Kaptchuk T, Street J, Deyo R. Randomized trial comparing traditional Chinese medical acupuncture, therapeutic massage, and self-care education for chronic low back pain. Arch Intern Med. 2001;161(8):1081–8.

    Article  CAS  PubMed  Google Scholar 

  94. Vickers AJ, Rees RW, Zollman CE, McCarney R, Smith CM, Ellis N, Fisher P, Van Haselen R. Acupuncture for chronic headache in primary care: large, pragmatic, randomised trial. BMJ. 2004;328(7442):744.

    Article  PubMed  PubMed Central  Google Scholar 

  95. Thomas K, MacPherson H, Thorpe L, Brazier J, Fitter M, Campbell M, Roman M, Walters S, Nicholl J. Randomised controlled trial of a short course of traditional acupuncture compared with usual care for persistent non-specific low back pain. Brit Med J. 2006;333(7569):623–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Mavridis D, White I, Higgins J, Cipriani A, Salanti G. Allowing for uncertainty due to missing continuous outcome data in pairwise and network meta-analysis. Stat Med. 2014;34(5):721–41.

    Article  PubMed  Google Scholar 

  97. Bujkiewicz S, Thompson J, Sutton A, Cooper N, Harrison M, Symmons D, Abrams K. Multivariate meta-analysis of mixed outcomes: a Bayesian approach. Stat Med. 2013;32(22):3926–43.

    Article  PubMed  PubMed Central  Google Scholar 

  98. Bujkiewicz S, Thompson JR, Sutton AJ, Cooper NJ, Harrison MJ, Symmons DP, Abrams KR. Use of Bayesian multivariate meta-analysis to estimate the HAQ for mapping onto the EQ-5D questionnaire in rheumatoid arthritis. Value Health. 2014;17(1):109–15.

    Article  PubMed  PubMed Central  Google Scholar 

  99. Achana FA, Cooper NJ, Bujkiewicz S, Hubbard SJ, Kendrick D, Jones DR, Sutton AJ. Network meta-analysis of multiple outcome measures accounting for borrowing of information across outcomes. BMC Med Res Methodol. 2014;14:92.

    Article  PubMed  PubMed Central  Google Scholar 

  100. Bujkiewicz S, Thompson JR, Riley RD, Abrams KR. Bayesian meta-analytical methods to incorporate multiple surrogate endpoints in drug development process. Stat Med. 2016;35(7):1063-89.

  101. Jenkins D, Martina R, Bujkiewicz S, Dequen P, Abrams K. Network Meta-Analysis of Biological Response Modifiers in Rheumatoid arthritis Including real World Evidence at Multiple time Points. Value Health. 2015;18(7):A343.

    Article  CAS  PubMed  Google Scholar 

  102. Tudor Smith C, Dwan K, Altman D, Clarke M, Riley R, Williamson PR. Sharing individual participant data from clinical trials: an opinion survey regarding the establishment of a central repository. Plos One. 2014;9(5):e97886.

    Article  Google Scholar 

Download references


The views expressed in this presentation are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. The ATC is funded by an R21 [AT004189I] from the National Center for Complementary and Alternative Medicine (NCCAM) at the National Institutes of Health (NIH) to AV and by a grant from the Samueli Institute. AM’s contribution was made under the terms of a Career Development research training fellowship issued by the NIHR [grant CDF-2009-02-21]. The study sponsors had no role in the study design; collection, analysis and interpretation of the data; in the writing of the manuscript; or in the decision to submit the manuscript for publication.


This work was supported by the National Institute for Health Research (NIHR) under Programme Grants for Applied Research [Grant No. RP-PG-0707-10186].

Availability of data and materials

This study represents a meta-analysis of data collected by authors other than those named on the authorship list. Where this data is publically available it is included in the supplementary material. The authors are unable to give unrestricted access to data that they did not collect. In particular, some of the trialists that gave data did not do so with ethical approvals that allowed unrestricted data sharing.

The WinBUGS model code relating to the newly developed modelling approach is provided in the additional supporting files.

Authors’ contributions

All authors made a substantial contribution to this work. PS, lead analyst and main author of this work, extensively contributed to all aspects of the statistical methods development, analysis and write up. BW, contributed to all aspects of the methods development, analysis and write. HW, co-ordinated the research and contributed to the analysis and write up. AM, co-ordinated the research and advised on aspects of the methods development and write up. KK, contributed to the analysis and write up. MS, co-ordinated the research and advised on aspects of the methods development and write up. AJV, advised on aspects of the methods development and write up. HM, co-ordinated the research and advised on aspects of the analysis and write up. All authors read and approved the final manuscript.

Competing interests

HM reports and states that he practices acupuncture part-time and is a Trustee of the not-for-profit organisation, the Northern College of Acupuncture. None of the other authors have competing interests for this work.

Consent to publish

Not applicable.

Ethics approval and consent to participate

Not applicable.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Pedro Saramago.

Additional files

Additional file 1:

Provides summary information of mapped EQ-5D and standardised pain data by study. (DOCX 105 kb)

Additional file 2:

Provides WinBUGS modelling code for models 1 and 2 as described in the main text. (DOCX 41 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saramago, P., Woods, B., Weatherly, H. et al. Methods for network meta-analysis of continuous outcomes using individual patient data: a case study in acupuncture for chronic pain. BMC Med Res Methodol 16, 131 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: