 Research article
 Open Access
 Published:
Evaluating antimalarial efficacy in singlearmed and comparative drug trials using competing risk survival analysis: a simulation study
BMC Medical Research Methodology volume 19, Article number: 107 (2019)
Abstract
Background
Antimalarial efficacy studies in patients with uncomplicated Plasmodium falciparum are confounded by a new infection (a competing risk event) since this event can potentially preclude a recrudescent event (primary endpoint of interest). The current WHO guidelines recommend censoring competing risk events when deriving antimalarial efficacy. We investigated the impact of considering a new infection as a competing risk event on the estimation of antimalarial efficacy in singlearmed and comparative drug trials using two simulation studies.
Methods
The first simulation study explored differences in the estimates of treatment failure for areas of varying transmission intensities using the complement of the KaplanMeier (KM) estimate and the Cumulative Incidence Function (CIF). The second simulation study extended this to a comparative drug efficacy trial for comparing the KM curves using the logrank test, and Gray’s ksample test for comparing the equality of CIFs.
Results
The complement of the KM approach produced larger estimates of cumulative treatment failure compared to the CIF method; the magnitude of which was correlated with the observed proportion of new infection and recrudescence. When the drug efficacy was 90%, the absolute overestimation in failure was 0.3% in areas of low transmission rising to 3.1% in the high transmission settings. In a scenario which is most likely to be observed in a comparative trial of antimalarials, where a new drug regimen is associated with an increased (or decreased) rate of recrudescences and new infections compared to an existing drug, the logrank test was found to be more powerful to detect treatment differences compared to the Gray’s ksample test.
Conclusions
The CIF approach should be considered for deriving estimates of antimalarial efficacy, in high transmission areas or for failing drugs. For comparative studies of antimalarial treatments, researchers need to select the statistical test that is best suited to whether the rate or cumulative risk of recrudescence is the outcome of interest, and consider the potential differing prophylactic periods of the antimalarials being compared.
Background
The primary endpoint in clinical studies of uncomplicated Plasmodium falciparum malaria is the occurrence of recrudescent parasitaemia, defined as recurrence due to the same parasite which caused the original infection. Parasite recurrence due to a heterologous parasite, which can either be a new infection with P. falciparum or another species of Plasmodia can potentially preclude the occurrence of recrudescence and constitute a competing risk event [1, 2]. Such scenario can occur when the parasite load of a newly acquired infection (regardless of the species or strain) outnumbers and outcompetes the low level of parasitaemia of an existing infection. A recrudescence can also be precluded when the new infection is due to a more resistant parasite strain compared to the existing susceptible parasite. These scenarios further depend on the inoculum density and the multiplication rates (efficiency) of the newly emergent infection and of the existing recrudescent parasites.
Despite advancement in statistical methods for analysing time to event outcomes [1,2,3,4,5,6,7], competing risk events are often ignored in the medical literature. Recent reviews have pointed out that a vast majority of studies published in high impact medical journal are susceptible to competing risk biases [8,9,10], and malaria is no exception. The KaplanMeier (KM) survival analysis (\( {\widehat{S}}_{KM}(t) \)) is currently recommended by the World Health Organization (WHO) for deriving antimalarial efficacy [11, 12]. Commonly the complement of the KM estimate (\( {\widehat{F}}_{KM}(t)=1{\widehat{S}}_{KM}(t) \)) is reported as the WHO recommends replacing a firstline treatment with an alternative regimen if the derived estimate of cumulative failure exceeds 10% [12].
The complement of the KM estimate provides an estimate of the marginal risk (of recrudescence), i.e. the risk of recrudescence where new infections do not occur. However, this is only possible when all enrolled participants are admitted to a hospital setting where it is not possible to get another mosquito bite, and thus, new infection. In practice, antimalarial trials are almost invariably conducted in endemic settings where new infections occur frequently and can be observed in as high as 50% of the cases [13]. The Cumulative Incidence Function (CIF) estimator proposed by Kalbfleisch and Prentice provides an alternative approach to estimate the cumulative failure by accounting for such competing risk events [14]. Several studies have compared the cumulative failure estimates derived by the complement of KM method against the CIF estimator and have reported that the KM approach leads to an overestimation of cumulative failure in the presence of competing risk events [9, 15,16,17,18].
The presence of competing risk events have further implications in comparative studies. Comparative antimalarial studies utilise the logrank test for comparing the efficacy of two drugs. The logrank test is essentially the comparison of the underlying causespecific hazard rate between two groups [19] (see Additional file 1, Section 1 for definitions). In the absence of competing risk events, there is a onetoone correspondence between the causespecific hazard rate and the cumulative risk. This means that any inference drawn upon the hazard function holds equivalently true for the survival function and the cumulative risk. However, in the presence of competing risk events, this onetoone relationship no longer holds true [20]. In such a scenario, inferences drawn using the logrank test for comparing the equality of causespecific hazard rates may not be valid when the interest is in comparing the cumulative risk of failure at time t. An alternative approach, which compares the difference in cumulative risks between two groups accounting for competing risk events, is the Gray’s ksample test [21]. This is the usual logrank test where the causespecific hazard function is replaced by the hazard of the subdistribution [22].
To date, there has been no comprehensive investigation of how new infections impact the analysis and interpretation of efficacy data in antimalarial trials of uncomplicated P. falciparum malaria. This simulation study aimed to address this gap and there were two specific objectives:

I.
To quantify the magnitude of overestimation in cumulative risk of treatment failure derived by the complement of the KaplanMeier approach compared to the Cumulative Incidence Function in a singlearmed antimalarial trial, and

II.
To quantify the influence of new infections on the comparative efficacy between antimalarial drugs, by comparing two statistical tests, the logrank test and Gray’s ksample test
Methods
Two simulation studies were carried out to explore the utility of competing risk survival analysis in single armed and comparative antimalarial drug trials. The generation of survival data is common to both of these studies and is described first.
Generation of survival data
The time to parasitic recurrences were simulated from baseline hazard functions reflective of underlying biological mechanism of recrudescence and new infection (Fig. 1). The hazard functions were derived from individual patient outcome data from 15 studies with 4122 children aged less than 5 years for the antimalarial regimen dihydroartemisininpiperaquine (DP). The existing studies analysed had an average efficacy of 95% in a sensitive parasite population. Fractional polynomials were used to capture nonmonotonous relationship between the log of the cumulative instantaneous hazard and time to recrudescence (new infection) in order to generate survival data (manuscript currently under preparation). We then varied the intercept parameters in these two functions to explore specific scenarios outlined in the simulation studies I and II. The following cumulative baseline hazard (CBH) functions (on log scale) were used for the generation of time to recrudescence (rc), and time to new infection (ni), respectively:
The parameters β_{0} and α_{0} represent the intercept and were varied to achieve the desired proportion of recrudescence and new infection.
Simulation study I: aim, design and setting
The first simulation study aimed at quantifying the magnitude of overestimation in cumulative risk of treatment failure derived by the complement of the KaplanMeier method compared to the Cumulative Incidence Function in a singlearmed antimalarial trial.
The following combination of parasitic recurrences were generated: recrudescent proportion (5, 10, and 15%) and new infection proportion (< 10%, 10–20%, 20–40% and > 40%). The base case simulation of 5% recrudescence represents the scenario of high efficacy currently observed with the artemisinin combination therapies in Africa [23,24,25]. The scenarios of 10 and 15% recrudescence represent the situations likely to be observed when antimalarial drug resistance worsens, which has now been observed for some antimalarials in Cambodia and Vietnam [26,27,28]. New infection proportions of < 10%, 10–20%, 20–40% and > 40% progressively represent areas of very low, low, moderate and high malaria transmission settings. Standard sample size calculations are not relevant for the methodological comparisons as the aim was to compare the derived estimates of cumulative risk of treatment failure from the two methods. Trials of sample size 100, 200, 500 and 1000 patients were simulated. Sample sizes of 100 and 200 were chosen to reflect the scenarios frequently observed in antimalarial studies.
The following steps describe the simulation protocol:

i.
Simulate time to recrudescence (t_{1}) using eq. (1). The parameter β_{0} was varied to achieve the desired proportion of recrudescence:

β_{0} = − 3.7092 for approximately 5% recrudescence by day 63 (base case scenario for recrudescence)

β_{0} = − 3.0160 for approximately 10% recrudescence by day 63

β_{0} = − 2.6105 for approximately 15% recrudescence by day 63

ii.
Simulate time to new infections (t_{2}) using eq. (2). The parameter α_{0} was varied in order to achieve the desired proportion of new infections:

α_{0} = − 5.6004 for approximately < 10% new infection by day 63

α_{0} = − 3.9909 for approximately 10–20% new infection by day 63

α_{0} = − 3.2978 for approximately 20–40% new infection by day 63

α_{0} = − 2.8924 for approximately > 40% new infection by day 63

iii.
Since early recurrences are very unlikely in patients with adequate drug exposure [25, 29], the minimum time was set to day 14 and administrative censoring was applied on the last scheduled followup visit (day 63). For simplicity, no losses to followup were assumed.

iv.
For each individual, the observed time (t) was defined as the minimum of the simulated time to recrudescence (t_{1}) and new infection (t_{2}).

v.
The final observed time was rounded to the nearest weekly visit day (7, 14, 21 and so on), reflective of the antimalarial followup design. The observed event corresponded to the event with minimum time, t, else administrative censoring was applied on day 63.

vi.
For each simulated dataset, the cumulative probability of failure was estimated on days 28, 42 and 63 using the 1 minus KM method and the CIF. New infections were censored on the day of occurrence in the 1KM analysis and were kept as a separate category of competing risk event when estimating the CIF.

vii.
The absolute and relative differences in the two estimators derived in step (vi) were calculated.

viii.
For each scenario, steps (i)(vii) were repeated 1000 times using an acceptance sampling procedure where only datasets fulfilling the study criteria were kept (e.g. 5% recrudescence, < 10% new infection). Studies where 4–6%, 9–11% and 14–16% of recrudescences were observed were defined to have 5, 10 and 15% recrudescence, respectively. In order to achieve the desired proportion of recrudescences (approximately 5, 10 and 15%), this required a large number of simulation runs, and the first 1000 datasets fulfilling the criteria were kept for analysis.
Simulation study II: aim, design and setting
The second simulation study aimed to quantify the influence of new infections on the comparative efficacy between antimalarial drugs, by comparing two statistical tests, the logrank test and Gray’s ksample test.
Let drug A be the current first line treatment and drug B be a new antimalarial drug under investigation. The interest is in establishing whether drug A and B are different in terms of their effect on recrudescence. The aim of the simulation was to present the results from the logrank test for comparing the equality of the KM curves of drug efficacies and Gray’s ksample test for comparing the cumulative risks of recrudescence for drug A and drug B at day 63. For the logrank test, new infections were censored on the time of recurrence.
Let \( {\lambda}_1^A(t) \) be the causespecific hazard function of recrudescence for drug A and \( {\lambda}_2^B(t) \) be the causespecific hazard function for drug B at time t. The null hypothesis under consideration for the logrank test is H_{0}:
Let \( {F}_1^A(t) \) and \( {F}_2^B(t) \) be the CIF of recrudescence for drug A and drug B respectively at time t. The null hypothesis under consideration for the Gray’s ksample test is I_{0}:
The following hazard ratio \( \left({\theta}_{rc}=\frac{\lambda_1^A(t)}{\lambda_2^B(t)}\right) \) of recrudescence (RC) for drug A relative to drug B was assumed:

θ_{rc}= 1.00 drug B has the same effect on RC as drug A

θ_{rc} = 2.72 drug B is associated with increased hazard of RC compared to drug A

θ_{rc} = 0.37 drug B is associated with decreased hazard of RC compared to drug A
Similarly, the following hazard ratio (θ_{ni}) of new infection (NI) for drug A relative to drug B was assumed:

θ_{ni}= 1.00 drug B has the same effect on NI as drug A

θ_{ni}= 2.72 drug B is associated with increased hazard of NI compared to drug A

θ_{ni}= 0.37 drug B is associated with decreased hazard of NI compared to drug A
θ_{ni}= 1.00 represents a null scenario, θ_{ni} = 2.72 represents a scenario where the new drug has a shorter terminal elimination halflife compared to the existing drug and thus exerts a shorter prophylactic effect, while θ_{ni} = 0.37 represents a scenario where the new drug is associated with a longer posttreatment prophylaxis than the reference drug.
Nine different possible scenarios of drugs A and B were explored in this study (Table 1, Fig. 2). Some of these scenarios presented might not be plausible in antimalarial studies and were kept for completeness as such scenarios might be applicable for other therapeutic interventions [30]. For antimalarial studies, we consider the scenarios where when drug B, compared to drug A exerts unidirectional effect i.e. associated with increased (or decreased) risk of both recrudescence and new infection as the most likely scenario. Similarly, a partially null scenario can be considered likely to be observed in antimalarial trials. For example, when drug A with a short halflife and drug B with a long halflife are compared, then despite observing similar efficacy, it can be expected that more new infections will be observed with drug A (Scenario 1B in Table 1).
Since this simulation was setup to evaluate type I error when comparing the two drugs, the number of patients needed per arm to detect a difference of a given loghazard ratio was calculated. A sample size of 500 patients per arm was found to be adequate across all the simulation scenarios studied assuming 80% power for three different loghazard ratios (Additional file 1, Section 2). However, as for simulation study I, we repeated the simulation for n = 100, 200, 500 and 1000 subjects/arm for completeness.
The following steps describe the simulation protocol for each scenario:

i.
For each drug arm, time to recrudescence (t_{1}) was simulated for 500 hypothetical patients using eq. (1). Since drug A is the reference treatment, its intercept parameter was held constant at − 3.7092 for all the simulation scenarios. The intercept parameter for drug B was varied to simulate the scenario of null effect (− 3.7092), increased effect (− 2.7092) or decreased effect (− 4.7092) of drug B on recrudescence relative to drug A. The corresponding hazard functions for different scenarios studied are presented in Fig. 2.

ii.
For each drug arm, time to new infection (t_{2}) was simulated for 500 patients using eq. (2). Since drug A is the reference treatment, its intercept parameter was held constant at − 2.8924 for all the simulation scenarios. The intercept parameter for drug B was varied to simulate the scenario of null effect (− 2.8924), increased effect (− 1.8924) or decreased effect (− 3.8924) of drug B on new infection relative to drug A. The corresponding hazard functions for different scenarios studied are presented in Fig. 2.

iii.
Repeat steps (iiiv) as outlined in simulation study I

iv.
The difference between drugs A and B in terms of cumulative recrudescence were tested using the logrank test at day 63 by censoring the new infections. The equality of CIFs for the two regimens was tested using Gray’s ksample test where a new infection was considered a competing risk event. Pvalues and the associated chisquared test statistic were extracted. The hazard ratio for drug A relative to drug B was estimated using the Cox regression model.

v.
The above simulations were repeated 1000 times and the proportion of times the derived pvalue from logrank test and Gray’s ksample test was less than 0.05 was calculated. This is equal to the rejection of the null hypothesis that there is no difference between the two treatment regimens in terms of the risk of recrudescence.
Software
The time to recrudescence and new infection were generated using the survsim package in Stata [31] (See Additional file 1, Section 3 for Stata codes). The logrank test was carried out using the survdiff function in the survival package and Gray’s ksample test was performed using the cuminc function in the cmprsk package in R software (Version 3.2.4) [32].
Results
Simulation study I
The findings of this simulation study are presented in Figs. 3 and 4, and Table 2. The 1 minus KM was associated with an overestimation of cumulative failure in all the scenarios studied. The magnitude of the overestimation increased with i) increasing proportion of new infections, ii) increasing proportion of recrudescences, and iii) the study followup duration (Fig. 3).
In the areas of low transmission (< 10% observed new infection), the maximum overestimation in the derived cumulative risk of recrudescence on day 63 was 0.16% when drug exhibited 95% efficacy (base case scenario), however as the drug efficacy fell to 85%, the difference in estimates increased to 0.46%. In the high transmission areas (> 40% new infections), the maximum absolute overestimation by the 1KM method was 1.75% for the base case simulation and this rose to 3.13 and 4.30% when the drug efficacy declined to 90 and 85% respectively (Table 2, Fig. 4).
The results when expressed on relative scale exhibited the same trend and conclusion as observed on the absolute scale (Additional file 1, Section 4). The results remained unaffected when the simulation was repeated with sample sizes of n = 100, 200, and 1000 patients (Additional file 1, Section 4).
Simulation study II
For each simulated dataset, the hazard ratio of recrudescence and new infection (for drug B relative to drug A) was estimated using the Cox model with treatment group as a covariate. The distribution of hazard ratios from 1000 simulations is presented in Fig. 5. Table 3 presents the results for the different scenarios considered with sample size of 500 patients per arm, which had at least 80% power to detect the desired hazard ratio for recrudescence between the two drugs across all the scenarios studied.
No difference in recrudescence
In the null situation (Scenario 1A), where it was postulated there was no difference in the risk of recrudescence and risk of new infection between the two drug regimens, both tests achieved their correct size (α) i.e. rejection rate was close to nominal 5%, as expected. Despite there being no difference between the two drugs for both events (as the respective hazard functions for recrudescence and new infections were identical for both drugs), stochastic variations will lead to a rejection of the null hypothesis approximately 5% of the time when the converse is true. In the partially null scenario of 1C i.e. drug B had the same effect on recrudescence as drug A but was associated with decreased hazard of new infection, both tests achieved their correct α. In partially null Scenario 1B, where drug B was associated with increased risk of new infection by a hazard ratio of 2.72, the logrank test correctly achieved its nominal size (5% rejection), but the Gray’s ksample test led to a slightly higher rejection rate (11.9%).
Drug A and B have the same posttreatment prophylaxis
When there was no difference between the drug A and drug B in terms of their posttreatment prophylaxis, but drug B was associated with increased recrudescence with a hazard ratio of 2.72 (Scenario 2A), both tests had similar rejection probability. The median proportion of recrudescence observed in this scenario was 6.5% in drug B compared to 2.5% for drug A. In scenario 2B, where the drug B decreased recrudescence relative to drug A (hazard ratio = 0.37), both tests led to rejection of the null hypothesis 80% of the time.
The most relevant and biologically plausible scenario in an antimalarial trial occurs when a new treatment exerts unidirectional effect on recrudescence and new infection (compared to the reference drug), corresponding to scenarios 3A and 3D. In scenario 3A, where drug B was associated with approximately 2fold increase in both recrudescence and new infection compared to drug A, the logrank test appeared to be the more powerful of the two approaches with rejection probability of 99% compared to 90% with Gray’s ksample test. In situation 3D, where drug B was associated with a median reduction in recrudescence and new infection by approximately 60%, the logrank test again proved to be superior by rejecting the null hypothesis of no difference (between drug A and drug B) 82.8% of the time compared to 71.3% by the Gray’s ksample test (Fig. 6, Panel D). The most interesting difference was observed when drug B exerted a differential effect on recrudescence and new infection, i.e. reduced recrudescence but increased new infection compared to drug A (Scenario 3C). In this situation, the Gray’s ksample test appeared to be the more powerful of the two tests (Fig. 6, Panel C). In Scenario 3B, where drug B was associated with increased recrudescence but reduced new infection, the results of the two tests were again very similar.
Assumption of proportional hazards
In the simulation scenarios studied, the assumption of proportional hazards was violated in 5.4% (490/9000) of the simulated datasets for the comparison of recrudescence, and 4.5% (407/9000) for new infection. The violation of this assumption didn’t seem to affect the results of the tests as the proportion of times this assumption was violated were similar across different scenarios (Additional file 1, Section 5). Increasing the number of simulation runs to 10,000 from 1000 didn’t change the result (Table 3, results from 10,000 simulation runs shown in parenthesis). However, there were small variations in the results when the simulation was repeated with different sample sizes (Table 4).
Impact of sample size
In studies with n = 100, and 200 (which were known to be underpowered from the sample size calculations), both tests achieved their nominal 5% level i.e. rejection probability close to 5% for scenario 1 (Table 4). In scenarios 2 and 3, where the hazards ratio for recrudescence between the two drugs was 2.72 and 0.37, the rejection probability did not reach the required level of 0.8.
As expected, when the sample size was increased to 1000 patients per arm, both tests achieved their nominal size in the null scenario with the exception of Gray’s ksample test for scenario 1B, which rejected the null hypothesis 21.7% despite there being no difference between the two drugs. In this scenario, the influence of sample size was apparent as the rejection probability using Gray’s ksample test progressively increased with an increase in study sample size. Both tests rejected the null hypothesis in nearly all simulations for scenarios 2 and 3.
Discussion
Competing risk survival analysis is increasingly being used in the medical and statistical literature [8, 33]. However, this approach remains novel in the context of antimalarial research [34]. The KM method is the currently recommended approach for deriving antimalarial drug efficacy of uncomplicated P. falciparum malaria. Theoretically, the KM method overestimates the cumulative incidence of recrudescence in the presence of new infection [17]. The magnitude of this overestimation is currently not documented and the implications for comparative efficacy studies is unknown. In order to fill this research gap, we carried out two simulation studies using biologically plausible survival functions consistent with the underlying pharmacokinetics profile of the antimalarial drugs.
The first simulation study quantified the degree of overestimation in cumulative incidence of recrudescence using the naïve 1 minus KM method compared to the CIF in a singlearmed antimalarial trial. The magnitude of the overestimation was found to increase with the increasing proportion of recrudescence, new infection and study followup duration; a finding consistent with the statistical and medical literature [16, 17]. The simulation study suggested that the estimates from the two approaches differed by less than 0.1% for most of the scenarios presented in Table 2; such differences are unlikely to have clinical consequences. In a scenario which reflected the current observations of drug efficacy with artemisinin combination therapies (> 95%), the overestimation was negligible in the areas of low transmission intensities, i.e. new infections lower than 10% (Table 2). For high transmission areas, this reached a maximum of 1.75%. However, we have also clearly identified several scenarios where the two methods will lead to a substantially different estimate. The magnitude of the overestimation was greatly increased when antimalarial drug efficacy began to decline. At 90% drug efficacy, the absolute deviation in derived estimates reached a maximum of 0.27% in the areas of low transmission and 3.13% for high transmission areas. When the efficacy fell to the low level of 85%, the overestimation reached 4.30% in the areas of high transmission. Similarly, in antimalarial studies, additional treatment is administered on detecting a recurrent parasitaemia. In such a scenario where the recurrence is due to a new infection, which has masked an existing lowdensity parasitaemia of the original infection (recrudescence), this would prevent the potential recrudescence from being observed due to additional antimalarial drugs. This will lead to an underestimation of failure. Taken together, our results highlight that estimation of drug failure in areas of high transmission requires careful attention and the CIF provides an alternative approach for deriving the failure estimates.
The second simulation study explored the results from the logrank test for comparing the causespecific hazard rates and Gray’s ksample test for comparing the cumulative incidences in comparative drug trials. A total of nine different hypothetical scenarios on how a new drug B might affect the recrudescence and new infection compared to an existing drug A were explored (Table 1). There were contrasting differences in two out of the nine scenarios. When drug B, compared to drug A, was associated with increased (or decreased) risk of both recrudescence and new infection, we found that logrank test was more powerful compared to Gray’s ksample test for detecting differences between the two treatments. However, when drug B had higher risk of recrudescence and lower risk of new infection (or vice versa) compared to drug A, then Gray’s ksample test was more powerful in detecting the differences between the two drugs in terms of primary endpoint (Table 3). This finding is consistent with the results reported by two previous simulation studies in statistical literature [18, 30]. However, it must be stressed that the latter scenario is less likely to be observed within the context of comparing antimalarial regimens in a reallife situation.
Our simulation study has a number of methodological limitations. First, time to recrudescence and new infection were generated assuming independence. While this greatly simplified the simulation settings, this is an assumption unlikely to be verified and carrying out simulation studies accounting for correlation between recrudescence and new infections remained beyond the scope of this work. Second, we assumed no losses to followup for simplicity. A loss to followup of approximately 20% is anticipated in antimalarial studies and this can be incorporated in the simulation studies as future work. Third, when simulating time to recrudescence, we used rejection sampling and kept the first 1000 observations with 4–6%, 9–11% and 14–16% recrudescence for the scenarios of 5, 10 and 15% recrudescence, respectively. This approach might have led to less variability between the 1000 simulated datasets. Fourth, in simulation study II, we simulated data based on reference drug A assuming low failure in the areas of low transmission (2.5% recrudescence and 21.4% new infections). Hence, the generalisability of results for comparative studies in areas of different transmission settings might be limited. And finally, this manuscript has focused on the point estimation of the derived failure estimates. However, we would like to emphasise that the uncertainty around the point estimates (associated 95% confidence interval) be given as equal importance as the point estimate.
Our results have important clinical consequences. The current WHO strategy for monitoring and evaluation of antimalarial drug efficacy uses a series of thresholdbased approaches. For new drugs to be eligible for introduction as a first line treatment, derived failure estimates should be less than 5%, and for current first line treatments, the failure estimates should not exceed 10% [35]. The results presented in Fig. 4 highlighted the implications for drug policy usage when the derived estimates are at the cusp of these thresholds. The derived estimate of cumulative failure was greater than 5% (Fig. 4a) and 10% (Fig. 4b) when the KM method was used, but remained below 5 and 10% respectively when using the competing risk survival analysis approach, i.e. the CIF. This highlights that ignoring the competing risk of new infections can result in potentially misleading conclusions being drawn from a clinical study, particularly in high transmission settings where a large fraction of patients may develop new infections during the followup period, thus confounding the derived efficacy estimates. Similarly, the effect of competing events has implications for not only standalone trials but also comparative drug trials, particularly when the partner component of the artemisinin combination therapies are eliminated at different rates. For example, lumefantrine, the partner drug in artemetherlumefantrine (AL), has an elimination halflife of 4 days and hence almost all antimalarial activity is subtherapeutic within 16 days [36]. Conversely the elimination halflife of piperaquine (partner drug in dihydroatemsininpiperaquine (DP)) is four weeks and it exerts prolonged post treatment prophylaxis, reducing the risk of recurrent infections for up to 42 days [36]. Hence, the observed proportion of competing risk events is expected to be significantly lower following DP compared to AL, especially in the areas of high transmission. When a large fraction of patients develop new infections, fewer patients are available from which recrudescences can be observed. Hence, it is important that the proportion of competing risk events be taken into consideration when comparing two regimens with different pharmacological properties.
There is an ongoing debate in medical and statistical literature regarding the choice of the method for comparing treatment regimens in the presence of competing risk events [19, 30, 37,38,39]. It is increasingly being advocated that if the research interest is in understanding the biological mechanism of how a treatment affects hazard rate, the logrank test is considered the appropriate method. However, when the interest is in comparison of overall risk i.e. if individuals receiving a particular drug are more likely to experience recrudescence, the comparison of CIF through Gray’s ksample test is considered appropriate [17, 40, 41]. Many authors advocate presenting results of both these approaches to provide a complete biological understanding of the treatment on different endpoints [17, 42]. It is important that researchers are aware that the choice of the analytical method in the presence of competing risk events should be guided by the research question of interest.
Conclusions
Our simulation study showed that 1 minus KM method led to an overestimation of cumulative antimalarial treatment failure compared to the CIF and the degree of overestimation was far greater in high transmission areas. In the areas where a large proportion of recurrences are attributable to new infections, the use of CIF should be considered as an alternative approach for the derivation of failure estimates for antimalarial studies. For comparative studies of antimalarial treatments, the choice of the statistical test should be guided by whether the rate or cumulative risk of recrudescence is the outcome of interest.
Abbreviations
 \( {\widehat{S}}_{KM}(t) \) :

KaplanMeier estimates of drug efficacy at time t
 \( {\widehat{F}}_{KM}(t) \) :

The complement of KaplanMeier estimate\( \left[1{\widehat{S}}_{KM}(t)\right] \)
 CBH:

Cumulative baseline hazard
 CIF:

Cumulative Incidence Function
 HR:

Hazards Ratio
 KM:

KaplanMeier
 NI:

New infection
 RC:

Recrudescence
 WHO:

World Health Organization
References
Prentice RL, Kalbfleisch JD, Peterson AV, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978;34:541–54.
Wolbers M, Koller MT, Stel VS, Schaer B, Jager KJ, Leffondre K, et al. Competing risks analyses: objectives and approaches. Eur Heart J. 2014;35:2936–41.
Blower S, Bernoulli D. An attempt at a new analysis of the mortality caused by smallpox and of the advantages of inoculation to prevent it. Rev Med Virol. 2004;14:275–88.
Evelyn F, Jerzy N. A simple stochastic recovery of relapse death and loss of patients. Hum Biol. 1951;Sep:205–41.
Cornfield J. The estimation of the probability of developing a disease in the presence of competing risks. Am J Public Health. 1957;47:601–7.
Chiang CL. Introduction to stochastic processes in biostatistics. New York, USA: Wiley; 1968.
Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data; 2002.
Koller MT, Raatz H, Steyerberg EW, Wolbers M. Competing risks and the clinical community: irrelevance or ignorance. Stat Med. 2012;31:1089–97.
Van Walraven C, McAlister FA. Competing risk bias was common in KaplanMeier risk estimates published in prominent medical journals. J Clin Epidemiol. 2016;69:170–3.
Austin PC, Fine JP. Accounting for competing risks in randomized controlled trials : a review and recommendations for improvement. Stat Med. 2017;36:1203–9.
World Health Organization. Assessment and monitoring of antimalarial drug efficacy for the treatment of uncomplicated falciparum malaria. Geneva, Switzerland; 2003.
World Health Organization. Methods for surveillance of antimalarial drug efficacy. Geneva. In: Switzerland; 2009.
Yeka A, Banek K, Bakyaita N, Staedke SG, Kamya MR, Talisuna A, et al. Artemisinin versus nonartemisinin combination therapy for uncomplicated malaria: randomized clinical trials from four sites in Uganda. PLoS Med. 2005;2:0654–62.
Kalbfleisch JD, Prentice RL. Competing risks and multistate models. In: The statistical analysis of failure time data. 2nd ed. New York, USA: John Wiley and Sons Inc; 2002. p. 247–77.
Southern DA, Faris PD, Brant R, Galbraith PD, Norris CM, Knudtson ML, et al. KaplanMeier methods yielded misleading results in competing risk scenarios. J Clin Epidemiol. 2006;59:1110–4.
Lacny S, Wilson T, Clement F, Roberts DJ, Faris PD, Ghali WA, et al. KaplanMeier survival analysis overestimates the risk of revision arthroplasty: a metaanalysis. Clin Orthop Relat Res. 2015;473:3431–42.
Gooley TA, Leisenring W, Crowley J, Storer BE. Estimation of failure probabilities in the presence of competing risks: new representations of old estimators. Stat Med. 1999;18:695–706.
Varadhan R, Weiss CO, Segal JB, Wu AW, Scharfstein D, Boyd C. Evaluating health outcomes in the presence of competing risks: a review of statistical methods and clinical applications. Med Care. 2010;48(6 Suppl):S96–105.
Bajorunaite R, Klein JP. Comparison of failure probabilities in the presence of competing risks. J Stat Comput Simul. 2008;78:951–66.
Andersen PK, Geskus RB, De witte T, Putter H. Competing risks in epidemiology: possibilities and pitfalls. Int J Epidemiol. 2012;41:861–70.
Gray RJ. A class of Ksample tests for comparing the cumulative incidence of a competing risk. Ann Stat. 1988;16:1141–54.
Klein JP. Competing risks. Wiley Interdisciplinary Reviews: Computational Statistics. 2010;2:333–9.
Worldwide Antimalarial Resistance Network (WWARN) AL Dose Impact Study Group. The effect of dose on the antimalarial efficacy of artemether–lumefantrine: a systematic review and pooled analysis of individual patient data. Lancet Infect Dis. 2015;15:692–702.
The WorldWide Antimalarial Resistance Network (WWARN) ASAQ Study Group. The effect of dosing strategies on the therapeutic efficacy of artesunateamodiaquine for uncomplicated malaria: a metaanalysis of individual patient data. BMC Med. 2015;13:66.
The WorldWide Antimalarial Resistance Network (WWARN) DP Study Group. The effect of dosing regimens on the antimalarial efficacy of DihydroartemisininPiperaquine: a pooled analysis of individual patient data. PLoS Med. 2013;10:1–17.
Leang R, Barrette A, Bouth DM, Menard D, Abdur R, Duong S, et al. Efficacy of dihydroartemisininpiperaquine for treatment of uncomplicated plasmodium falciparum and plasmodium vivax in Cambodia, 2008 to 2010. Antimicrob Agents Chemother. 2013;57:818–26.
Saunders DL, Vanachayangkul P, Lon C. Dihydroartemisinin–Piperaquine Failure in Cambodia. N Engl J Med. 2014;371:484–5.
Phuc BQ, Rasmussen C, Duong TT, Dong LT, Loi MA, Tarning J, et al. Treatment failure of Dihydroartemisinin/Piperaquine for plasmodium falciparum malaria, Vietnam. Emerg Infect Dis. 2017;23:715–7.
WorldWide Antimalarial Resistance Network (WWARN) Lumefantrine PK/PD Study Group. Artemetherlumefantrine treatment of uncomplicated plasmodium falciparum malaria: a systematic review and metaanalysis of day 7 lumefantrine concentrations and therapeutic response using individual patient data. BMC Med. 2015;13:227.
Williamson PR, KolamunnageDona R, Tudur Smith C. The influence of competingrisks setting on the choice of hypothesis test for treatment effect. Biostatistics. 2007;8:689–94.
Crowther MJ, Lambert PC. Simulating biologically plausible complex survival data. Stat Med. 2013.
R: a language and environment for statistical computing. In: R Foundation for statistical computing; 2017. https://www.rproject.org/.
Austin PC, Lee DS, Fine JP. Introduction to the analysis of survival data in the presence of competing risks. Circulation. 2016;133:601–9.
Dahal P, Simpson JA, Dorsey G, Guérin PJ, Price RN, Stepniewska K. Statistical methods to derive efficacy estimates of antimalarials for uncomplicated plasmodium falciparum malaria: pitfalls and challenges. Malar J. 2017;16:430.
World Health Organization. Responding to antimalarial drug resistance. In: World Health Organization. 2017. http://www.who.int/malaria/areas/drug_resistance/overview/en/. Accessed 5 Dec 2017.
World Health Organization. Guidelines for the treatment of malaria: third edition. Geneva, Switzerland; 2015.
Freidlin B, Korn EL. Testing treatment effects in the presence of competing risks. Stat Med. 2005;24:1703–12.
Dignam JJ, Kocherginsky MN. Choice and interpretation of statistical tests used when competing risks are present. J Clin Oncol. 2008;26:4027–34.
Rotolo F, Michiels S. Testing the treatment effect on competing causes of death in oncology clinical trials. BMC Med Res Methodol. 2014;14:1–11.
Pintilie M. Analysing and interpreting competing risk data. Stat Med. 2007;26:1360–7.
Tai BC, Wee J, Machin D. Analysis and design of randomised clinical trials involving competing risks endpoints. Trials. 2011;12:127.
Latouche A, Allignol A, Beyersmann J, Labopin M, Fine JP. A competing risks analysis should report results on all causespecific hazards and cumulative incidence functions. J Clin Epidemiol. 2013;66:648–53.
Acknowledgements
We thank Dr. Marcel Wolbers for several helpful discussions on the topic and Prof. Sir Nick J White for his astute clinical acumen.
Funding
PD is funded by Tropical Network Fund, Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, University of Oxford. The WorldWide Antimalarial Resistance Network (PD, KS, RNP, and PJG) is funded by a Bill and Melinda Gates Foundation grant and the ExxonMobil Foundation. JAS is an Australian National Health and Medical Research Council Senior Research Fellow (1104975). RNP is a Wellcome Trust Senior Fellow in Clinical Science (200909). This work was supported in part by the Australian Centre of Research Excellence on Malaria Elimination (ID# 1134989). The funders did not participate in the study development, the writing of the paper, decision to publish, or preparation of the manuscript.
Availability of data and materials
Data generated and analysed for this study is available from the corresponding author on reasonable request.
Author information
Authors and Affiliations
Contributions
PD, PJG, RNP, JAS and KS conceived the idea and wrote the first draft of the manuscript. PD, JAS and KS designed the simulation study. PD performed all the simulations. All authors read and approved the final version.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1:
Additional text and results (DOCX 130 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Dahal, P., Guerin, P.J., Price, R.N. et al. Evaluating antimalarial efficacy in singlearmed and comparative drug trials using competing risk survival analysis: a simulation study. BMC Med Res Methodol 19, 107 (2019). https://doi.org/10.1186/s1287401907482
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287401907482
Keywords
 Malaria
 Plasmodium falciparum
 Efficacy
 Competing risk events
 Cumulative incidence function