Dose-response meta-analysis of differences in means

Background Meta-analytical methods are frequently used to combine dose-response findings expressed in terms of relative risks. However, no methodology has been established when results are summarized in terms of differences in means of quantitative outcomes. Methods We proposed a two-stage approach. A flexible dose-response model is estimated within each study (first stage) taking into account the covariance of the data points (mean differences, standardized mean differences). Parameters describing the study-specific curves are then combined using a multivariate random-effects model (second stage) to address heterogeneity across studies. Results The method is fairly general and can accommodate a variety of parametric functions. Compared to traditional non-linear models (e.g. Emax, logistic), spline models do not assume any pre-specified dose-response curve. Spline models allow inclusion of studies with a small number of dose levels, and almost any shape, even non monotonic ones, can be estimated using only two parameters. We illustrated the method using dose-response data arising from five clinical trials on an antipsychotic drug, aripiprazole, and improvement in symptoms in shizoaffective patients. Using the Positive and Negative Syndrome Scale (PANSS), pooled results indicated a non-linear association with the maximum change in mean PANSS score equal to 10.40 (95 % confidence interval 7.48, 13.30) observed for 19.32 mg/day of aripiprazole. No substantial change in PANSS score was observed above this value. An estimated dose of 10.43 mg/day was found to produce 80 % of the maximum predicted response. Conclusion The described approach should be adopted to combine correlated differences in means of quantitative outcomes arising from multiple studies. Sensitivity analysis can be a useful tool to assess the robustness of the overall dose-response curve to different modelling strategies. A user-friendly R package has been developed to facilitate applications by practitioners.


Background
The identification and characterization of dose-response relationships is an essential part of the analysis in many scientific disciplines such as toxicology, pharmacology, and epidemiology. This is particularly important in the development and testing of new compounds (e.g. a new drug, pharmaceutical treatment) where trials at different stages aim to evaluate the efficacy of increasing levels of dosage (Phase II-III trials) or to derive a dose-response curve for selection of optimal doses (Phase IV trials) [1,2]. Randomized clinical trials often investigate a continuous outcome variable, such as the efficacy or safety of a drug, reporting the change from baseline of a medical score, or the final value of a clinical measurement. The dose-response results are typically summarized by dosespecific means and standard deviations [3]. Measures of effect are expressed in terms of mean or standardized mean differences using a dose level, usually the placebo group, as referent [1]. Over the last few years methodological research focused on developing and improving methods for performing dose-response analysis in a single study [4,5]. A conclusive result is hardly obtained by a single investigation and there is often the need to synthesize information collected from multiple studies. In such a case meta-analytic methods can be used to define an overall relation or to investigate heterogeneity across study findings.
A method for pooling aggregated dose-response data where the outcome is a log relative risk was originally presented by Greenland and Longnecker in 1992 [6]. Since then, several papers have refined and covered specific aspects of the methodology such as model specification [7,8], modeling strategies [9,10], and software implementation [11,12]. Other methodological articles extended the approach for continuous outcome but in the case where individual patient data are available, mainly in the context of time-series environmental studies [13][14][15].
Only a few alternatives have been proposed to pool aggregated dose-response data where the findings are summarized by differences in means. Davis and Chen [16] in 2004 described a methodology for summarizing dose-response curves of first and second generation antipsychotics in schizoaffective patients. The authors reconstructed drug-specific dose-response curves and conducted a meta-analysis to compare the effectiveness of medium vs high dosages. A common alternative to analyze the drug effect consists of fitting a random-intercept E max model, where the random component accounts for heterogeneity in placebo effect across trials [17]. Heterogeneity, however, may be related to other study characteristics rather than differences in placebo response such as implementation, participants, intervention, and outcome definition. Thomas et al. [18] adopted hierarchical Bayesian models to summarize and describe, independently, the distribution of study-specific model parameters derived from an E max model.
The mentioned strategies assumed pre-specified models that do not allow for non-monotonic curves which may occur in practice [19], as in case of dose-response data of antipsychotics. In addition, fitting study-specific sigmoidal curves such as the E max model requires that the single studies have assessed at least three dose levels in order to estimate model parameters. Discarding studies not providing enough data points represents a loss of information and may introduce bias.
The aim of this paper is to formalize and propose a general and flexible methodology to pool dose-response relations from aggregated data where the changes in the distribution of the quantitative outcome are expressed in terms of differences in means. We first present the data necessary for a dose-response meta-analysis and derive formulas for obtaining effect sizes and their variance/covariance structure. We describe flexible doseresponse models with particular emphasis on regression splines. The method is then applied to dose-response data from clinical trials on use of aripiprazole and symptoms improvement in schizoaffective patients.

Dose-response data
The notation and data required for a dose-response metaanalysis for a generic study are displayed in Table 1. We consider I studies indexed by i = 1, . . . , I reporting the results of a common treatment at different dose levels x ij , j = 1, . . . , J i , where x 0i = 0 indicates the control or placebo group in the i-th study. The study-specific results typically consist of dose-specific means of an outcome variable, Y ij , that measures the efficacy of the j-th dose in the i-th study [3]. Additional information about the number of patients allocated in each treatment, n ij , and the sample standard deviations of Y ij , sd ij , is generally reported or obtained from the study-specific results.

Effect sizes and their variance/covariance
A common way to reduce heterogeneity in placebo response is to compute the effect size (or treatment effect) as difference between dose-specific means and placebo mean. In case all studies measure the outcome on a common and interpretable scale, the difference can be based on the absolute scale Assuming common study-specific population standard deviations, the variance of d ij is defined as where s 2 p i = J i j=0 n ij − 1 sd 2 ij / J i j=0 n ij − 1 is the square of the pooled standard deviation for the i-th study. Since the study-specific mean differences d ij use the same referent values,Ȳ i0 , they cannot be regarded as independent. The covariance term is defined as In case the outcome is measured on different scales the effect sizes can be based on standardized mean differences with

Dose-response analysis
The chosen effect sizes and the corresponding (co)variances are used to estimate the study-specific doseresponse curves. The dose-response curves characterize the relative efficacy of the dose under investigation using the placebo effect as referent (i.e. the relative efficacy for the placebo is zero by definition). The dose-response models are expressed through the parametric model f , which specifies how the effect size varies according to the dose values. The functional relationship f is parametrized in terms of θ i , the p × 1 vector of dose-response coefficients. We consider the case of mean differences, d ij , but the same principles apply for standardized mean differences, d * ij . The study-specific curves can be written as i is the covariance matrix of the residual error term, with Var d ij along the diagonal and Cov d ij , d ij offdiagonal.
Several alternatives are available to model the doseresponse curve (i.e. for the choice of f ). Table 2 shows the definition of 4 types of models ordered according to the number of parameters, ranging from 1 for a linear model up to 3 for a logistic model. See Bretz et al. for a comprehensive description [2].
The most common choice in dose findings [2] is the use of the E max model which is expressed in terms of three parameters: the maximum effect (θ 1i ), the dose to produce half of the maximum effect (θ 2i ) and the steepness of the curve (θ 3i ) [20]. As other non-linear models, the E max model assumes a specific shape that does not allow for non-monotonic curve and its estimation requires at least three non reference dose levels. Quadratic models are defined by only p = 2 coefficients but may poorly fit at extreme dose values [9]. Other non-linear models such as logistic and sigmoidal models, are commonly defined by p ≥ 3 coefficients so that study-specific aggregated data may not be sufficient to estimate the parameters. We propose the use of regression splines to flexibly model the dose of interest. Splines represent a family of smooth functions that can describe a wide range of curves (i.e. U-shaped, J-shaped, S-shaped, threshold) [21]. The curves consist of piecewise polynomials over consecutive intervals defined by k knots. Their use may facilitate curve fitting since many non-linear curves can be examined by estimating only a small number of coefficients. For instance, a restricted cubic spline model with three with two transformations [23] defined as where the '+' notation, with u + = u if u ≥ 0 and u + = 0 otherwise, has been used. An alternative flexible approach to model the doseresponse association is represented by fractional polynomials. In particular, a dose-response model based on fractional polynomial of order two can be written as in Eq. 6 with the two transformations defined as for each combination of p 1 and p 2 in the predefined set of values {−2, −1, −0.5, 0, 0.5, 1, 2, 3}; for p = 0, x p becomes log(x). The best fitting fractional polynomial is typically chosen based on the Akaike's Information Criterion [24].
Once the functional relation f has been selected, generalized least square estimation can be performed to efficiently estimates the dose-response coefficientsθ j and the corresponding (co)variance matrixV j , by minimizing that generally requires numerical optimization algorithms. If f is a linear combination of the parameters θ i , as in the case of regression splines and fractional polynomials, the close solution can be written aŝ where X i indicates the J i ×p design matrix in the i-th study.

Meta-analysis
The estimated study-specific dose-response coefficientŝ θ i and the accompanying (co)variance matricesV i are combined by means of multivariate meta-analysiŝ A fixed-effects model assumes no statistical heterogeneity among study results, i.e. differences in the doseresponse coefficients are only related to sampling error.
The assumption of homogeneity may not hold in practice, unless it is known that the studies are performed in a similar way and are sampled from the same population [25]. The Cochran's Q test [26] is typically used to test statistical heterogeneity across studies (H 0 : = 0) [27]. Selected studies, however, will typically differ with respect to study design and implementation, selection of participants, and type of analyses. A certain degree of heterogeneity is expected and should be taken into account in the analysis. A random-effects model allows the doseresponse coefficients, θ i , to vary across studies. Statistical heterogeneity is captured by the between-studies variance while θ represents the mean of the distribution of doseresponse coefficients and an estimate,θ , can be obtained using (restricted) maximum likelihood estimation [15].
As a final result, the pooled dose-response curve can be presented in either a graphical or tabular form by predicting the mean differences of the outcome for a set of x dose values with an approximate (1 − α/2)% confidence interval (CI), that in case f is a linear combination of θ can be expressed as where z α/2 is the α/2-th quantile of a standard normal distribution.

Dose findings
Once the pooled dose-response curve has been estimated, it may be of interest to determine a set of target doses, i.e. doses associated with prespecified outcome effects. In development of new compounds it is often important to select an optimal dose which is almost as effective as the maximum effective dose but has less undesired side effects, which often occur at high dosages. Suppose one wants to determine which is the lowest dose (ED γ ) to produce an almost complete effect, e.g. γ % of the observed maximum predicted response. The ED γ can be determined as where x max is the dose corresponding to the maximum predicted outcome. An important step when presenting results from dose findings analysis is to accompany the previous estimates with a measure of precision, typically confidence intervals. Pinheiro et al. [3] proposed the use a parametric bootstrap approach based on the asymptotic normal distribution of θ , the pooled estimate of the dose-response coefficients. The approach consists in re-sampling the dose-response coefficients θ from its approximate normal distribution and derive the distribution of ED γ based on the samples. Approximated confidence intervals for ED γ can be constructed using percentiles of the sampling distribution.

Results
To illustrate the methodology we examined the dose-response relation between aripiprazole, a secondgeneration antipsychotic, and symptoms improvement in schizoaffective patients. We updated the search strategy presented in a previous review by Davis and Chen [16] by searching the Medline, International Pharmaceutical Abstracts, CINAHL, and the Cochrane Database of Systematic Reviews. To reduce the exclusion of unpublished papers, additional sources including Food and Drug administration website, data from Cochrane reviews, poster presentations and conference abstracts were also searched. All random-assignment, double-blind, controlled clinical trials of schizoaffective patients providing dose-response results for at least two non-zero dosages of aripiprazole were eligible.
Five studies [28][29][30][31][32] met the inclusion criteria and were included in the analysis. All the studies reported mean changes from baseline as main outcome variable, using the Positive and Negative Syndrome Scale (PANSS). The PANSS scale is an ordinal score derived from 30 items ranging from 1 to 7. Computations of ratios such as percentage changes are not directly applicable and may lead to erroneous results [33,34]. To address this issue, the theoretical minimum (i.e. 30) needs to be subtracted from the original score [35]. Information about the means, the number of patients assigned to each treatment, and the standard deviation was available from the published data. Because all the studies measured the outcome variable on the same scale, we computed PANSS mean differences as effect sizes. Data are reported in Table 3.
We used the trial by Cutler et al. [28] to illustrate the steps required for estimating the dose-response curve for a single study. For example, the difference in mean PANSS comparing the dose of 2 mg/day relative to 0 mg/day is d 11  To estimate the dose-response curve we need first to specify the model f . We characterized the dose-response relation using a restricted cubic spline model with three knots located at the 10th, 50th, and 90th percentiles (0, 10, and 30 mg/day) of the overall dose distribution (p = 2). The restricted cubic spline dose-response model is defined as in Eq. 7. Efficient estimates of the dose-response coefficients and (co)variance matrix were obtained by generalized least square estimationθ We applied the same procedure to the other studies included in the meta-analysis in order to obtain the study-specificθ i andV i , i = 1, . . . , 5 ( Table 4). The studyspecific predicted curves are presented in Fig. 1. Under a random-effects model, restricted maximum likelihood estimates werê A p-value < 0.001 for the multivariate Wald-type test H 0 : θ = 0 provided strong evidence against the null hypothesis of no relation between different doses of aripiprazole and mean change PANSS score. The Q test (Q = 3.5, p-value = 0.899) did not detect substantial statistical heterogeneity across studies.
To communicate results of the pooled dose-response analysis, we can estimate the pooled mean differences in PANSS scores using 0 mg/day as referent as 0.937x 1 − 1.156x 2 , together with the corresponding 95 % confidence interval for a generic dose x of interest as following

Sensitivity analysis
A sensitivity analysis is often required to evaluate the robustness of the pooled dose-response curve. In the spline model, for example, the location of the knots may affect the shape of the dose-response curve. Therefore we considered alternative knots locations including different combinations of the 10th, 25th, 50th, 75th and 90th percentiles of the overall dose distribution (0, 0.5, 10, 18.75, and 30 mg/day). A graphical comparison is presented in the left panel of Fig. 3. The alternative curves roughly described the same dose-response shape with no substantial variation, all indicating an increase in the mean change PANSS score up to 20 mg/day of aripiprazole. We can assess whether there is an increasing trend above 20 mg/day by simply re-defining x 2 equal to (x − 20) + in Eq. 8; this approach is known as piecewise linear model. The rate of change in the PANSS mean differences was negative and not statistically significant (θ 1 + θ 2 =-0.284, p = 0.18) after 20 mg/day of aripiprazole.  To evaluate the sensitivity of the dose-response curve to the choice of the parametric model f adopted, instead, we considered three alternatives: fractional polynomials; quadratic; and E max . Since two studies only had two nonreferent doses, the study-specific (sigmoidal) E max models as described in Table 2 cannot be estimated. A common solution is to fix the steepness of the curve θ 3 to be 1, also referred to as hyperbolic E max [20].
The "best" fractional polynomials (p 1 = 0.5, p 2 = 1) provided overall a similar dose-response curve when compared to the spline model, with slightly higher value for the maximum predicted response (right panel of Fig. 3). The hyperbolic E max had substantially higher predicted mean differences for low values of the dose. The non-linear model assumes a specific hyperbolic dose-response curve that did not seem to fit the data and may be dependent from the choice of fixing θ 3 to be 1. The dose-response curve described by the quadratic model fall in between the spline and the hyperbolic E max curves.

Discussion
In this paper we proposed a statistical method to combine differences in means of quantitative outcomes. The method consists of dose-response models estimated within each study (first stage) and an overall curve obtained by pooling study-specific dose-response coefficients (second stage). The covariance among studyspecific mean differences is taken into account in the first stage analysis using generalized least square estimators, while statistical heterogeneity across studies is allowed by multivariate random-effects model in the second stage.
One major strength of the proposed method is that it is fairly general and can accommodate different modeling strategies, including non-linear ones described by Pinheiro et al. [3]. Non-linear models, however, are defined by at least three or four parameters, and hence require an equal number of dose levels for each single study included in the analysis. Given that some studies may have investigated a lower number of dose levels, exclusion of these studies may result in substantial loss of information. In addition, many non-linear models assume a specific behaviour (e.g. monotonicity) requiring a strong a priori information about the dose-response curve. The choice of the parametric model is critical, since it highly influences the final results [3]. Indeed, the selection of the dose-response model should be informed by subject-matter knowledge as well as understanding of the research questions at hand. We presented the use of regression splines as a flexible tool for modeling any quantitative exposure. The major advantage is that a variety of curves, even non monotonic ones, can be estimated using only two parameters. It is considered to be closed to non-parametric regression, since no major assumptions about the shape of the curve are needed [9]. A possible alternative is the use of fractional polynomials. In comparing the two strategies, we did not find important differences between the two strategies and concluded that both are useful tools to characterize a (non-linear) dose-response curve. Nonetheless a sensitivity analysis is generally required to evaluate the robustness of the combined results.
A possible limitation of the proposed methodology is that it requires information about dose-specific means and standard deviations. Studies providing other summary measures, such as dose-specific medians, would not be included the analysis. The dose-response analysis presented in Eq. 6 is based on the asymptotic normal distribution of the conditional mean effect size. Extension of the introduced methodology to percentiles is not straightforward and may represent an interesting topic of future research.
An additional limitation of aggregated dose-response data is that supplementary information for approximating the covariance terms may not be available. Articles may report directly mean or standardized mean differences and standard errors for non-referent dose groups. Whenever the standard deviation for the outcome variable in the control group (s 2 i0 ) cannot be obtained, it may be approximated using the pooled standard deviation based on the non-referent dose levels (s 2 p ij ). Alternatively a specific value may be imputed and a sensitivity analysis can be performed to evaluate how the results of the meta-analysis vary for different values of s 2 0j . Further limitations relate to the general application of metaanalysis based on aggregated data. These include restrictions in subgroup analyses, the impossibility of assessing the appropriateness of individual analyses, and to harmonize variable definitions and analyses for reducing the extent of heterogeneity, as well as specific biases such as aggregation (or ecological) bias in meta-regression models. Meta-analysis of individual patient data, however, are often difficult to undertake especially for the availability of individual data, so that usage of aggregated data may represent the only alternative [36]. Specific to aggregated dose-response data, different dose references and exposure range may complicate the analysis. The presented methodology assume that all the selected studies share a common dose-response model. Important departure from this assumption may limit and/or impact the pooling of individual dose-response coefficients. An alternative methods has been proposed based on a series of univariate meta-analyses of effect sizes for a pre-specified grid of dose-levels [37]. Further work is needed to analyze this possibility and potential advantages. Depending on the extent of heterogeneity of the dose-response curves, however, it may not be opportune to pool study-specific results, and meta-regression or stratified analyses should be performed [38].
In our application, we considered the effectiveness of increasing dosages of aripiprazole in shizoaffective patients. We described the steps needed to obtain the overall dose-response curve and to present it in a graphical form. We observed a non-linear association with the maximum efficacy corresponding to aripiprazole 19.32 mg/day. An estimated dose of 10.46 mg/day, however, may be sufficient to obtain 80 % of the maximum effect, which may be relevant for avoiding possible undesired side effects. Sensitivity analysis showed similar results as compared to fractional polynomials. The E max model presented higher drug efficacy for low dosages. Compared to the previous models, the E max model did not seem to fit properly the data at low dosages.

Conclusions
We described an approach to combine differences in means of a quantitative outcome contrasting different dose levels relative to a placebo in randomized trials. The general framework of the proposed methodology can include a variety of flexible models. Sensitivity analysis can be a useful tool to assess the stability of the overall dose-response curve to different modelling strategies. Although the method was presented for the analysis of randomized trials, it may be extended to observational studies where mean differences are further adjusted for potential confounders. Future work is needed to evaluate the properties of the statistical model and validity of the underlying assumptions. A user friendly procedure is implemented in the dosresmeta R package [39] with worked examples available on GitHub.
Abbreviations PANSS, positive and negative syndrome scale