Analysis of an ordinal outcome in a multicentric randomized controlled trial: application to a 3- arm anti- malarial drug trial in Cameroon

Background Malaria remains a burden in Sub-Saharan Countries. The strategy proposed by the World Health Organization (WHO) is to systematically compare the therapeutic efficacy of antimalarial drugs using as primary outcome for efficacy, a four-category ordered criterion. The objective of the present work was to analyze the treatment effects on this primary outcome taking into account both a center-effect and individual covariates. A three-arm, three-centre trial of Amodiaquine (AQ), sulfadoxine-pyrimethamine (SP) and their combination (AQ + SP), conducted by OCEAC-IRD in 2003, in 538 children with uncomplicated Plasmodium falciparum malaria, is used as an illustration. Methods Analyses were based on ordinal regression methods, assuming an underlying continuous latent variable, using either the proportional odds (PO) or the proportional hazards (PH) models. Different algorithms, corresponding to both frequentist- and bayesian-approaches, were implemented using the freely available softwares R and Winbugs, respectively. The performances of the different methods were evaluated on a simulated data set, and then they were applied on the trial data set. Results Good coverage probability and type-1 error for the treatment effect were achieved. When the methods were applied on the trial data set, results highlighted a significance decrease of SP efficacy when compared to AQ (PO, odds ratio [OR] 0.14, 95% confidence interval [CI] 0.04-0.57; hazard ratio [HR] 0.605, 95% CI 0.42-0.82), and an equal effectiveness between AQ + SP and AQ (PO, odds ratio [OR] 1.70, 95% confidence interval [CI] 0.25-11.44; hazard ratio [HR] 1.40, 95% CI 0.88-2.18). The body temperature was significantly related to the responses. The patient weights were marginally associated to the clinical response. Conclusion The proposed analyses, based on usual statistical packages, appeared adapted to take into account the full information contained in the four categorical outcome in malaria trials, as defined by WHO, with the possibility of adjusting on individual and global covariates.


Background
Many clinical trials have a primary outcome which is measured on a categorical scale. In these situations, the outcome categories are often ordered, either by a direct relationship to either a continuous measurement or a clinical score through pre-defined cut-points, or by a natural interpretation by the physician of a health condition or body reaction.
An overview and recent developments in the analysis of ordered categorical data have been presented in [1][2][3][4]. More specific developments for analyzing ordinal data in the context of meta-analysis of clinical trials have been proposed initially for summary statistics [5,6], secondarily extended to meta-analysis with individual patient's data [7,8].
The main objective of this paper was to adapt and implement the above-proposed methods in the specific context of malaria. A more specific objective was to study the effect of both different treatments and a set of covariates on the primary outcome.
In antimalarial trials, the WHO (World Health Organization) established a clinical study protocol [9,10] to assess the efficacy of antimalarial drugs for uncomplicated malaria, in which the primary outcome is a fourcategory outcome, i) ACPR (adequate clinical and parasitological response); ii) LPF (late parasitological failure); iii) LCF (late clinical failure) and; iv) ETF (early treatment failure). Patient status reflects the ordinal character of the WHO criteria. Danger signs appearing on day 1, 2 or 3 with a persistent parasitaemia correspond to an ETF. The presence of fever and parasitaemia between day 4 and day 14 without an ETF is considered as LCF, whereas the presence of parasitaemia on day 14 without ETF and LCF is a LPF. ACPR corresponds to the absence of parasite regardless the body temperature on day 14 without being classified as either ETF, or LCF or a LPF previously. This suggests that there is an explicit order between the categories. A multi-centre randomized controlled trial of three antimalarial drugs, amodiaquine (AQ), sulfadoxinepyrimethamine (SP) and their combination (AQ + SP), which was conducted in 2003 by OCEAC-IRD in patients with uncomplicated P. falciparum malaria from 3 health centers in Cameroon, according to the WHO protocol [9,10] will be used to illustrate the statistical approach.
Briefly, the 3 centers were selected as they were known to be associated with different levels of transmission. Affected children, 0-5 years old, were the target population, according to the WHO recommendations for assessing antimalarial drug efficacy. The primary outcome of efficacy was the four category-variable: C 1 = ACPR, C 2 = LPF, C 3 = LCF, and C 4 = ETF, the first category being the best category and the last category being the worst, in terms of health condition. Table 1 describes the results for each arm of anti-malarial drug, in the three centers. The majority of patient outcomes were assigned in the first category, with very few treatment outcomes in the other categories, corresponding to various degrees of therapeutic failures. The protocol was approved by both the Ministry of Public Health and the Ethical National Committee. The person legally in charge of the children gave an informed consent before inclusion in the study. Study investigators were OCEAC field workers. A global analysis of the data, based on the rate of ACPR can be found in [11].

The proportional odds model
The statistical approach is based on the proportional odds model [12]. Consider the K = 3 centers. Each patient had a response into one of the m = 4 categories, (C 1 , .., C m ), with n i patients in center i. Let be the total number of patients overall. Denote Z the response and π ijc , the probability that the jth subject in study i had a response in category c; π ijc = Pr(Z ij = c). Let also Q ijc = Pr(Z ij ≤ c) be the cumulative probability of a response in category c or better, so that Q ijc = π ij1 + ... + π ijc and Q ijm = 1; c = 1, ..., m-1. Hence c = m corresponding to the worst outcome (here an ETF) is the baseline level. The proportional odds model is defined by where α c represents the cth intercept and η ij = γ 1 x 1ij + ...+ γ p x pij , is a linear combination of explanatory variables. The model assumes 1) an increasing order for the α c : α 1 <α 2 < ... <α m-1 ; 2) α 0 = -∞ and α m = ∞.
The assumption of proportional odds signifies that the ratio of the odds of the event Z ij ≤ c for any pair of sets of explanatory variables is independent of the category c. The cumulative logit model (1) results from the existence of a latent random variable with a logistic distribution, the (α c , c = 0, ..., m) corresponding to the cut-off thresholds for categorizing the latent variable into m classes. Alternatives to the proportional odds model can be developed based on different assumptions regarding the latent variable, like a normal-or an extreme value distribution associated with the probit or complementary loglog links, respectively. Fixed effect model The basic model extends the proportional odds model (PO) proposed in [7] to a situation with L treatments (L > 2). For the sake of simplicity, the presentation will be restricted to the situation of L = 3 treatments corresponding to the real data set but the extension to more than 3 treatments is straightforward. Assuming that one particular treatment (here, AQ) was selected as the reference treatment, the treatment covariate is modeled using L-1 = 2 dummy variables T 1 and T 2 , where T l (l = 1, L-1) yields 1 if subject received the treatment l, and 0 otherwise (here, l = 1 corresponds to SP, l = 2 corresponds to AQ + SP) where {j = 1, ..., n i ,} corresponds to the individuals within each arm, {i = 1, .., K}, corresponds to the centers and {c = 1, ..., m-1} to the categories. The parameter γ 1 represents the log-odds ratio of efficacy associated to (1) (2) treatment 1 versus the reference treatment, and γ 2 the log-odds ratio of efficacy associated to treatment 2 versus the reference treatment. The center effects are modeled by γ 0i where γ 01 = 0 for identifiability constraint. Testing the proportional odds assumptions It is worth noting that model (2) assumes that, within each center and for each outcome category c, (c = 1, .., m-1), there is a common log-odds ratio, γ 1 for treatment l versus the reference treatment. Similarly, model (2) assumes also that, within each treatment group, and for each outcome category c there is a common log-odds ratio, γ 0iγ 0i , between two centers i and i', independent of the outcome category.
Additional models were introduced to test the PO assumption. Model (3) allows for testing a global PO between treatments, without a center effect: However, each treatment can have a different effect according to the outcome category.
Assuming a global PO for T 1 , the model (4) allows for testing PO between T 2 and the reference treatment where δ hc = 1 if h = c and 0, otherwise. Similarly, the model (5) assumes a global PO for T 2 , and allows for testing PO between T1 and the reference treatment (here AQ) Mixed effects model Discarding a center effect, and still assuming proportional odds between treatments, 2 random effects were added to the treatment variables.
By grouping separately the fixed and the randomeffects, the model can be rewritten as follows: where , where G', the transposed matrix of G, needs to be estimated from the data.

Estimation methods
The parameters of the fixed effects models were estimated using three different methods, i.e. maximum likelihood (ML), iterative generalized least squares (IGLS) and generalized estimating equations (GEE) [7]. In the IGLS and GEE methods, the ordinal response was considered as a set of correlated binary responses. Thus the proportional odds model is viewed as a member of the multivariate generalized linear models [13]. All these methods were implemented under the free Statistical Software R http://cran.r-project.org/. The various codes for the ML, GEE and IGLS methods are in the additional file 1. Models (3), (4) and (5) were implemented for the test of the proportional odds assumption between centers and treatments. The parameters of the random effects model were estimated using a Bayesian approach for hierarchical models. In the Bayesian paradigm, prior distributions for all unknown parameters θ need to be specified. The intercept α c were given noninformative N(0,10 4 ) priors. Parameters β 1i and β 2i were assumed to be from N(γ 1 , σ 1 2 ) and N(γ 2 , σ 2 2 ) respectively. We gave γ 1 , γ 2 non-informative N(0,10 -6 ). The difficulty in the Bayesian paradigm is the choice of the a priori distribution for the inverse of the variances σ 1 2 and σ 2 2 . Several forms of the a priori are available. We chose an a priori of the form Gamma(0.5η, 0.5ηζ 2 ), where η and ζ 2 are either arbitrary chosen or calibrated from the data. Here we chose η = 2. The initial estimated values of ζ 2 (0,007, 0,001) were obtained from our data, by adjusting a generalized linear mixed model. So 1/σ 1 2 and 1/σ 2 2 were given Gamma(1,0.007) and Gamma(1,0.001), respectively. All the Bayesian analyses were performed using 3 chains of 50000 samples, out of which the first 25000 were removed to allow for convergence. Thus, each posterior statistic was based on 75000 samples. Finally the 95% credibility interval was obtained for each statistic.

Simulations
To assess the performance of the fixed effect methods, we carried out analyses on simulated data based on model (6). The aim was to compare the methods when the treatment covariate was taken into consideration and random effects were introduced to account for center variability. We generated a data set of 3 centers, each containing 100 patients, comparing 3 treatments A, B, C, with equal size. We supposed that the treatment effects were the same in the 3 centers, and were set to be γ 1 = -1 for B comparing to A and, γ 2 = 1 for C comparing to A, respectively. Initial values for α c were α 01 = -2, α 02 = -1, α 03 = 0. The initial value of the matrix G' was a diagonal matrix with diagonal elements σ 1 = 0.01 and σ 1 = 0.02, corresponding to the between center deviation.
The choice of the treatment initial values was guided by preliminary results from real data. The choice for the cutpoint values was derived from [14].
Cumulative probabilities Q ijc were then obtained using model (6). For a subject j in study i, with vector of cumulative probabilities (0, Q ij1 , Q ij2 , Q ij3 , 1), the simulated outcome was generated using the quartile range category, to which an uniformly (0,1) distributed random value belonged. To check that the conclusions were not sensitive to the choice of the starting values, we repeated the simulations, for each method, with different initial values. An example of the simulated data set can be found in additional file 2.
For each method, bias and confidence levels were estimated through N 1000 = replications. For each estimation method, the treatment effect was assessed using the calculated coverage probability and the type I error.
In the Bayesian analysis, model (6) was implemented using the Bayesian Software WinBUGS. A program is provided in the additional file 3, showing how model (6) can be written in BUGS language.

Simulated data set
Results are summarized in Table 2. The ML-, IGLS-and GEE-methods yielded similar results. For the detection bias, the ML estimate appeared less biased than the two other methods. Considering their coverage probabilities (CP = 1β), all the three approaches appeared to be able to detect a significant treatment effect with a given type I error α.

Results based on treatment and center effect only
Among the n = 538 patients from the three centers, 19 patients were lost to follow-up by day 14 and hence could not contribute to the analyses. Results from the randomized trial are displayed in Table 3 with a general view of the fixed and the Bayesian approach. All the methods showed a significant decrease of the SP effect as compared with AQ. The SP treatment appeared less efficacious than AQ, by all the three fixed methods (ML: OR = e -1.93 = 0.14; 95% CI 0.04-0.50; IGLS: OR = e -1.77 = 0.17; 95% CI 0.04-0.58; GEE: OR = e -1.80 = 0.16; 95% CI 0.04-0.57). The combination AQ-SP was not significantly different by the three approaches. The additional file 4 shows how practically these fixed estimates can be obtained with an example of the different output given in additional file 5.
The between center variability was = 0.023 (95% credibility interval 0.001-0.107) for the SP/AQ treatment, and = 0.020 (95% credibility interval 0.0001-0.024) for the AS+SP/AQ treatment. The assumption of proportional odds between treatments and centers was tested using the ML method by comparing the change in deviance of the various models (Table 4) to the chi-squared χ 1-α 2 (2 df ) which equals 5.99. The results indicate that the assumption was satisfactory.

Results after covariates adjustment
Covariates known at the time of inclusion were the parasitaemia, the type of treatment, the gender, the patient's age, the bodyweight (kg), a previous treatment before the study enrollment, the haematocrit and the baseline body temperature. The patient's bodyweight was log-transformed; the initial parasitaemia counts were transformed to normality using a Box-Cox transformation (estimated power = 0.1028, p-value = 0.0056). The baseline haematocrit was categorized into two classes, below (A) and above (B) 25%. The patient's age was also categorized into 3 categories: i) less than 1 year old (A); ii) between 2 and 4 years old (B); iii) more than 5 years old (C). First, a univariate analysis including each covariate of interest, in addition to the treatment covariates and the  Fixed models. Column 2 corresponds to the real values used for simulating data. Columns 3-5 correspond to the mean estimates according to different methods with the corresponding standard error estimates in brackets; columns 6-8 correspond to the corresponding absolute and relative biases (*); columns 9-11 corresponds to the coverage probability and, in brackets, the type I error for the treatment effect only.
centre covariate was performed. Then only covariates with a p-value under 10% were selected for a multivariate analysis. Only the baseline temperature reached a statistical significance (Table 5). Multivariate analysis did not change the magnitude of the treatment effects. As all the estimated intercept values were strictly positive, the assumption of a logistic distribution for the latent variable could be replaced with an extreme value distribution, using a model with a complementary log-log (cloglog) link function [2]. In Table 6, the corresponding results show that the intercept value-estimates were reduced as compared to the corresponding ones using the logistic distribution. However, the treatment parameter estimates did not change although the interpretation is not strictly the same. Indeed, as pointed out by Agresti [2], trying a complementary log-log link function is

Discussion
The present work applied methods, initially presented in the context of a meta-analysis comparing 2 treatment effects with a categorical outcome, to the analysis of multi-center-antimalarial trials, comparing 3 treatments, with a 4-categorical primary outcome, using individual patient data. All the methods were implemented using free available softwares. Their performances were assessed using simulated data sets. Both a frequentist and Bayesian approaches were implemented. In the simulation settings, we considered a general case of a mixed effects model. The ML method was faster than the IGLS and GEE methods. All the methods produced minimal biases. Another R function can be used like lrm (Hmisc library using the library Design) which is more restricted at present in the choice of the link functions.
Our results from the multi-centre trial showed a significant decrease in efficacy in the sulfadoxinepyrimethamine-arm as compared with the amodiaquinearm. This means that there was less ACPR and more failures with SP. No significant difference was observed between amodiaquine and the combination AQ + SP. Similar results hold for both the frequentist and Bayesian approaches. The proportional odds assumption between treatments was not rejected. The proposed model can be a useful way when dealing with a small number of centers with more than 2 treatments.
Regarding the dependence between the outcome and the explanatory variables, a recent study [15] reported patient's age as a significant factor associated with antimalarial treatment failure, more specifically with artemisinin derivatives and quinine. Indeed, the study showed that children less than 5 years old had a signifi- cant risk of treatment failure as compared with children more than 5 years. In our analysis, no significant age effect was found, but our data concerned mostly children less than 5 years. Covariates, like haematocrit, gender, auto-medication before study enrollment, were not significant. In contrast, the baseline temperature was significantly associated with the outcome, showing that patients with a low temperature at inclusion were more likely to have a good treatment response. For the SP treatment, the decrease could be related to a gene mutation, responsible for a resistance towards SP [16][17][18]. The SP resistance was also confirmed in a recent study in South Africa [19]. The emergence of resistance led to the use of Artemisinin-based Combination Therapies (ACTs) which play an important role in the reduction of malaria transmission [20]. Thus, ACTs are widely used as firstline treatment of malaria in Africa [21][22][23][24][25][26][27][28].
The present work presented some statistical methods to handle the WHO criteria globally. They appear adapted for assessing antimalarial treatment in clinical trials, or in other studies with an ordinal outcome. It contrasts with previous methods [15], which reduced the outcome into a binary variable, for instance ACPR versus non ACPR. Only 3.5% were missing data. By eliminating drop-out or excluded patients in the final analysis, we assumed that they were missing completely at random (MCAR). Incorporating these into the analysis would require an imputation of the missing category. For such imputation, wide-range methods have been studied [29], but remain a topic to study in the case of malaria.
Other study designs are presently available in which patients have repeated standardized visits during their follow-up. Thus repeated categorical responses are available for each subject. All methods used in the case of a single categorical response, can however, be extended to repeated categorical responses [30].

Conclusion
The approach of modeling the WHO criterion as an ordinal criterion is innovative in the sense that, it gives a gain of information and precision on the estimate of the treatment effect using a global analysis of the four-category outcome instead of either comparing the rate of ACPR only, without being able to differentiate treatments when they present an identical percentages in the ACPR category but, different repartitions in the remaining three other categories, or repeating a similar binary analysis for each category, then exposing to the increase of the type-1 error. This analysis could help in avoiding a conclusion on an equal treatment effect whereas there is not or, viceversa. In addition, this approach is able to take into account individual's covariates to improve the prediction of efficacy.