Evaluating antimalarial efficacy in single-armed and comparative drug trials using competing risk survival analysis: a simulation study

Background Antimalarial efficacy studies in patients with uncomplicated Plasmodium falciparum are confounded by a new infection (a competing risk event) since this event can potentially preclude a recrudescent event (primary endpoint of interest). The current WHO guidelines recommend censoring competing risk events when deriving antimalarial efficacy. We investigated the impact of considering a new infection as a competing risk event on the estimation of antimalarial efficacy in single-armed and comparative drug trials using two simulation studies. Methods The first simulation study explored differences in the estimates of treatment failure for areas of varying transmission intensities using the complement of the Kaplan-Meier (K-M) estimate and the Cumulative Incidence Function (CIF). The second simulation study extended this to a comparative drug efficacy trial for comparing the K-M curves using the log-rank test, and Gray’s k-sample test for comparing the equality of CIFs. Results The complement of the K-M approach produced larger estimates of cumulative treatment failure compared to the CIF method; the magnitude of which was correlated with the observed proportion of new infection and recrudescence. When the drug efficacy was 90%, the absolute overestimation in failure was 0.3% in areas of low transmission rising to 3.1% in the high transmission settings. In a scenario which is most likely to be observed in a comparative trial of antimalarials, where a new drug regimen is associated with an increased (or decreased) rate of recrudescences and new infections compared to an existing drug, the log-rank test was found to be more powerful to detect treatment differences compared to the Gray’s k-sample test. Conclusions The CIF approach should be considered for deriving estimates of antimalarial efficacy, in high transmission areas or for failing drugs. For comparative studies of antimalarial treatments, researchers need to select the statistical test that is best suited to whether the rate or cumulative risk of recrudescence is the outcome of interest, and consider the potential differing prophylactic periods of the antimalarials being compared. Electronic supplementary material The online version of this article (10.1186/s12874-019-0748-2) contains supplementary material, which is available to authorized users.


Background
The primary endpoint in clinical studies of uncomplicated Plasmodium falciparum malaria is the occurrence of recrudescent parasitaemia, defined as recurrence due to the same parasite which caused the original infection. Parasite recurrence due to a heterologous parasite, which can either be a new infection with P. falciparum or another species of Plasmodia can potentially preclude the occurrence of recrudescence and constitute a competing risk event [1,2]. Such scenario can occur when the parasite load of a newly acquired infection (regardless of the species or strain) outnumbers and outcompetes the low level of parasitaemia of an existing infection. A recrudescence can also be precluded when the new infection is due to a more resistant parasite strain compared to the existing susceptible parasite. These scenarios further depend on the inoculum density and the multiplication rates (efficiency) of the newly emergent infection and of the existing recrudescent parasites.
Despite advancement in statistical methods for analysing time to event outcomes [1][2][3][4][5][6][7], competing risk events are often ignored in the medical literature. Recent reviews have pointed out that a vast majority of studies published in high impact medical journal are susceptible to competing risk biases [8][9][10], and malaria is no exception. The Kaplan-Meier (K-M) survival analysis (Ŝ KM ðtÞ ) is currently recommended by the World Health Organization (WHO) for deriving antimalarial efficacy [11,12]. Commonly the complement of the K-M estimate (F KM ðtÞ ¼ 1−Ŝ KM ðtÞ) is reported as the WHO recommends replacing a first-line treatment with an alternative regimen if the derived estimate of cumulative failure exceeds 10% [12].
The complement of the K-M estimate provides an estimate of the marginal risk (of recrudescence), i.e. the risk of recrudescence where new infections do not occur. However, this is only possible when all enrolled participants are admitted to a hospital setting where it is not possible to get another mosquito bite, and thus, new infection. In practice, antimalarial trials are almost invariably conducted in endemic settings where new infections occur frequently and can be observed in as high as 50% of the cases [13]. The Cumulative Incidence Function (CIF) estimator proposed by Kalbfleisch and Prentice provides an alternative approach to estimate the cumulative failure by accounting for such competing risk events [14]. Several studies have compared the cumulative failure estimates derived by the complement of K-M method against the CIF estimator and have reported that the K-M approach leads to an overestimation of cumulative failure in the presence of competing risk events [9,[15][16][17][18].
The presence of competing risk events have further implications in comparative studies. Comparative antimalarial studies utilise the log-rank test for comparing the efficacy of two drugs. The log-rank test is essentially the comparison of the underlying cause-specific hazard rate between two groups [19] (see Additional file 1, Section 1 for definitions). In the absence of competing risk events, there is a one-to-one correspondence between the cause-specific hazard rate and the cumulative risk. This means that any inference drawn upon the hazard function holds equivalently true for the survival function and the cumulative risk. However, in the presence of competing risk events, this one-to-one relationship no longer holds true [20]. In such a scenario, inferences drawn using the log-rank test for comparing the equality of cause-specific hazard rates may not be valid when the interest is in comparing the cumulative risk of failure at time t. An alternative approach, which compares the difference in cumulative risks between two groups accounting for competing risk events, is the Gray's k-sample test [21]. This is the usual log-rank test where the cause-specific hazard function is replaced by the hazard of the sub-distribution [22].
To date, there has been no comprehensive investigation of how new infections impact the analysis and interpretation of efficacy data in antimalarial trials of uncomplicated P. falciparum malaria. This simulation study aimed to address this gap and there were two specific objectives: I. To quantify the magnitude of overestimation in cumulative risk of treatment failure derived by the complement of the Kaplan-Meier approach compared to the Cumulative Incidence Function in a single-armed antimalarial trial, and II. To quantify the influence of new infections on the comparative efficacy between antimalarial drugs, by comparing two statistical tests, the log-rank test and Gray's k-sample test

Methods
Two simulation studies were carried out to explore the utility of competing risk survival analysis in single armed and comparative antimalarial drug trials. The generation of survival data is common to both of these studies and is described first.

Generation of survival data
The time to parasitic recurrences were simulated from baseline hazard functions reflective of underlying biological mechanism of recrudescence and new infection (Fig. 1). The hazard functions were derived from individual patient outcome data from 15 studies with 4122 children aged less than 5 years for the antimalarial regimen dihydroartemisinin-piperaquine (DP). The existing studies analysed had an average efficacy of 95% in a sensitive parasite population. Fractional polynomials were used to capture non-monotonous relationship between the log of the cumulative instantaneous hazard and time to recrudescence (new infection) in order to generate survival data (manuscript currently under preparation). We then varied the intercept parameters in these two functions to explore specific scenarios outlined in the simulation studies I and II. The following cumulative baseline hazard (CBH) functions (on log scale) were used for the generation of time to recrudescence (rc), and time to new infection (ni), respectively: The parameters β 0 and α 0 represent the intercept and were varied to achieve the desired proportion of recrudescence and new infection.
Simulation study I: aim, design and setting The first simulation study aimed at quantifying the magnitude of overestimation in cumulative risk of treatment failure derived by the complement of the Kaplan-Meier method compared to the Cumulative Incidence Function in a single-armed antimalarial trial.
The following combination of parasitic recurrences were generated: recrudescent proportion (5, 10, and 15%) and new infection proportion (< 10%, 10-20%, 20-40% and > 40%). The base case simulation of 5% recrudescence represents the scenario of high efficacy currently observed with the artemisinin combination therapies in Africa [23][24][25]. The scenarios of 10 and 15% recrudescence represent the situations likely to be observed when antimalarial drug resistance worsens, which has now been observed for some antimalarials in Cambodia and Vietnam [26][27][28]. New infection proportions of < 10%, 10-20%, 20-40% and > 40% progressively represent areas of very low, low, moderate and high malaria transmission settings. Standard sample size calculations are not relevant for the methodological comparisons as the aim was to compare the derived estimates of cumulative risk of treatment failure from the two methods. Trials of sample size 100, 200, 500 and 1000 patients were simulated. Sample sizes of 100 and 200 were chosen to reflect the scenarios frequently observed in antimalarial studies.
The following steps describe the simulation protocol: i. Simulate time to recrudescence (t 1 ) using eq. (1). The parameter β 0 was varied to achieve the desired proportion of recrudescence: iii. Since early recurrences are very unlikely in patients with adequate drug exposure [25,29], the minimum time was set to day 14 and administrative censoring was applied on the last scheduled follow-up visit (day 63). For simplicity, no losses to follow-up were assumed. iv. For each individual, the observed time (t) was defined as the minimum of the simulated time to recrudescence (t 1 ) and new infection (t 2 ).
The final observed time was rounded to the nearest weekly visit day (7, 14, 21 and so on), reflective of the antimalarial follow-up design. The observed event corresponded to the event with minimum time, t, else administrative censoring was applied on day 63. vi. For each simulated dataset, the cumulative probability of failure was estimated on days 28, 42 and 63 using the 1 minus K-M method and the CIF. New infections were censored on the day of occurrence in the 1-K-M analysis and were kept as a separate category of competing risk event when estimating the CIF. vii. The absolute and relative differences in the two estimators derived in step (vi) were calculated. viii.For each scenario, steps (i)-(vii) were repeated 1000 times using an acceptance sampling procedure where only datasets fulfilling the study criteria were kept (e.g. 5% recrudescence, < 10% new infection). Studies where 4-6%, 9-11% and 14-16% of recrudescences were observed were defined to have 5, 10 and 15% recrudescence, respectively. In order to achieve the desired proportion of recrudescences (approximately 5, 10 and 15%), this required a large number of simulation runs, and the first 1000 datasets fulfilling the criteria were kept for analysis.
Simulation study II: aim, design and setting The second simulation study aimed to quantify the influence of new infections on the comparative efficacy between antimalarial drugs, by comparing two statistical tests, the log-rank test and Gray's k-sample test. Let drug A be the current first line treatment and drug B be a new antimalarial drug under investigation. The interest is in establishing whether drug A and B are different in terms of their effect on recrudescence. The aim of the simulation was to present the results from the log-rank test for comparing the equality of the K-M curves of drug efficacies and Gray's k-sample test for comparing the cumulative risks of recrudescence for drug A and drug B at day 63. For the log-rank test, new infections were censored on the time of recurrence.
Let λ A 1 ðtÞ be the cause-specific hazard function of recrudescence for drug A and λ B 2 ðtÞ be the cause-specific hazard function for drug B at time t. The null hypothesis under consideration for the log-rank test is H 0 : Let F A 1 ðtÞ and F B 2 ðtÞ be the CIF of recrudescence for drug A and drug B respectively at time t. The null hypothesis under consideration for the Gray's k-sample test is I 0 : The following hazard ratio ðθ rc ¼ 2 ðtÞ Þ of recrudescence (RC) for drug A relative to drug B was assumed: θ rc = 1.00 drug B has the same effect on RC as drug A θ rc = 2.72 drug B is associated with increased hazard of RC compared to drug A θ rc = 0.37 drug B is associated with decreased hazard of RC compared to drug A Similarly, the following hazard ratio (θ ni ) of new infection (NI) for drug A relative to drug B was assumed: θ ni = 1.00 drug B has the same effect on NI as drug A θ ni = 2.72 drug B is associated with increased hazard of NI compared to drug A θ ni = 0.37 drug B is associated with decreased hazard of NI compared to drug A θ ni = 1.00 represents a null scenario, θ ni = 2.72 represents a scenario where the new drug has a shorter terminal elimination half-life compared to the existing drug and thus exerts a shorter prophylactic effect, while θ ni = 0.37 represents a scenario where the new drug is associated with a longer post-treatment prophylaxis than the reference drug.
Nine different possible scenarios of drugs A and B were explored in this study ( Table 1, Fig. 2). Some of these scenarios presented might not be plausible in antimalarial studies and were kept for completeness as such scenarios might be applicable for other therapeutic interventions [30]. For antimalarial studies, we consider the scenarios where when drug B, compared to drug A exerts unidirectional effect i.e. associated with increased (or decreased) risk of both recrudescence and new infection as the most likely scenario. Similarly, a partially null scenario can be considered likely to be observed in antimalarial trials. For example, when drug A with a short half-life and drug B with a long half-life are compared, then despite observing similar efficacy, it can be expected that more new infections will be observed with drug A (Scenario 1B in Table 1).
Since this simulation was set-up to evaluate type I error when comparing the two drugs, the number of patients needed per arm to detect a difference of a given log-hazard ratio was calculated. A sample size of 500 patients per arm was found to be adequate across all the simulation scenarios studied assuming 80% power for three different log-hazard ratios (Additional file 1, Section 2). However, as for simulation study I, we repeated the simulation for n = 100, 200, 500 and 1000 subjects/arm for completeness.
The following steps describe the simulation protocol for each scenario: i. For each drug arm, time to recrudescence (t 1 ) was simulated for 500 hypothetical patients using eq. P-values and the associated chi-squared test statistic were extracted. The hazard ratio for drug A relative to drug B was estimated using the Cox regression model. v. The above simulations were repeated 1000 times and the proportion of times the derived p-value from log-rank test and Gray's k-sample test was less than 0.05 was calculated. This is equal to the rejection of the null hypothesis that there is no difference between the two treatment regimens in terms of the risk of recrudescence.

Software
The time to recrudescence and new infection were generated using the survsim package in Stata [31] (See Additional file 1, Section 3 for Stata codes). The log-rank test was carried out using the survdiff function in the survival package and Gray's k-sample test was performed using the cuminc function in the cmprsk package in R software (Version 3.2.4) [32].

Simulation study I
The findings of this simulation study are presented in Figs. 3 and 4, and Table 2. The 1 minus K-M was associated with an overestimation of cumulative failure in all the scenarios studied. The magnitude of the overestimation increased with i) increasing proportion of new infections, ii) increasing proportion of recrudescences, and iii) the study follow-up duration (Fig. 3).
In the areas of low transmission (< 10% observed new infection), the maximum overestimation in the derived cumulative risk of recrudescence on day 63 was 0.16% when drug exhibited 95% efficacy (base case scenario), however as the drug efficacy fell to 85%, the difference in estimates increased to 0.46%. In the high transmission areas (> 40% new infections), the maximum absolute overestimation by the 1-KM method was 1.75% for the  study I (n = 500 subjects). The figure shows the derived cumulative estimate of recrudescence in three cases from simulation study I where the maximum difference was observed between 1-(K-M) and CIF for 5, 10 and 15% respectively in the areas of very high transmission (> 40% new infections). The absolute difference between the two estimators was 1.8, 3.1 and 4.3% on day 63 respectively for 5, 10 and 15% recrudescence. These three cases are the extreme cases presented in Fig. 3 for the scenarios where > 40% new infections were observed  Fig. 4). The results when expressed on relative scale exhibited the same trend and conclusion as observed on the absolute scale (Additional file 1, Section 4). The results remained unaffected when the simulation was repeated with sample sizes of n = 100, 200, and 1000 patients (Additional file 1, Section 4).

Simulation study II
For each simulated dataset, the hazard ratio of recrudescence and new infection (for drug B relative to drug A) was estimated using the Cox model with treatment group as a covariate. The distribution of hazard ratios from 1000 simulations is presented in Fig. 5. Table 3 presents the results for the different scenarios considered with sample size of 500 patients per arm, which had at least 80% power to detect the desired hazard ratio for recrudescence between the two drugs across all the scenarios studied.

No difference in recrudescence
In the null situation (Scenario 1A), where it was postulated there was no difference in the risk of recrudescence and risk of new infection between the two drug regimens, both tests achieved their correct size (α) i.e. rejection rate was close to nominal 5%, as expected. Despite there being no difference between the two drugs for both events (as the respective hazard functions for recrudescence and new infections were identical for both drugs), stochastic variations will lead to a rejection of the null hypothesis approximately 5% of the time when the converse is true. In the partially null scenario of 1C i.e. drug B had the same effect on recrudescence as drug A but was associated with decreased hazard of new  Table 1 infection, both tests achieved their correct α. In partially null Scenario 1B, where drug B was associated with increased risk of new infection by a hazard ratio of 2.72, the log-rank test correctly achieved its nominal size (5% rejection), but the Gray's k-sample test led to a slightly higher rejection rate (11.9%).

Drug A and B have the same post-treatment prophylaxis
When there was no difference between the drug A and drug B in terms of their post-treatment prophylaxis, but drug B was associated with increased recrudescence with a hazard ratio of 2.72 (Scenario 2A), both tests had similar rejection probability. The median proportion of recrudescence observed in this scenario was 6.5% in drug B compared to 2.5% for drug A. In scenario 2B, where the drug B decreased recrudescence relative to drug A (hazard ratio = 0.37), both tests led to rejection of the null hypothesis 80% of the time.
The most relevant and biologically plausible scenario in an antimalarial trial occurs when a new treatment exerts unidirectional effect on recrudescence and new infection (compared to the reference drug), corresponding to scenarios 3A and 3D. In scenario 3A, where drug B was associated with approximately 2-fold increase in both recrudescence and new infection compared to drug A, the log-rank test appeared to be the more powerful of the two approaches with rejection probability of 99% compared to 90% with Gray's k-sample test. In situation 3D, where drug B was associated with a median reduction in recrudescence and new infection by approximately 60%, the log-rank test again proved to be superior by rejecting the null hypothesis of no difference (between drug A and drug B) 82.8% of the time compared to 71.3% by the Gray's k-sample test (Fig. 6, Panel D). The most interesting difference was observed when drug B exerted a differential effect on recrudescence and new infection, i.e. reduced recrudescence but increased new infection compared to drug A (Scenario 3C). In this situation, the Gray's k-sample test appeared to be the more powerful of the two tests (Fig. 6, Panel C). In Scenario 3B, where drug B was associated with increased recrudescence but reduced new infection, the results of the two tests were again very similar.

Assumption of proportional hazards
In the simulation scenarios studied, the assumption of proportional hazards was violated in 5.4% (490/9000) of the simulated datasets for the comparison of recrudescence, and 4.5% (407/9000) for new infection. The violation of this assumption didn't seem to affect the results of the tests as the proportion of times this assumption was violated were similar across different scenarios (Additional file 1, Section 5). Increasing the number of simulation runs to 10,000 from 1000 didn't change the result (Table 3, results from 10,000 simulation runs shown in parenthesis). However, there were small variations in the results when the simulation was repeated with different sample sizes ( Table 4).

Impact of sample size
In studies with n = 100, and 200 (which were known to be under-powered from the sample size calculations), both tests achieved their nominal 5% level i.e. rejection probability close to 5% for scenario 1 (Table 4). In scenarios 2 and 3, where the hazards ratio for recrudescence between the two drugs was 2.72 and 0.37, the rejection probability did not reach the required level of 0.8. As expected, when the sample size was increased to 1000 patients per arm, both tests achieved their nominal size in the null scenario with the exception of Gray's k-sample test for scenario 1B, which rejected the null hypothesis 21.7% despite there being no difference between the two drugs. In this scenario, the influence of sample size was apparent as the rejection probability using Gray's k-sample test progressively increased with an increase in study sample size. Both tests rejected the null hypothesis in nearly all simulations for scenarios 2 and 3.

Discussion
Competing risk survival analysis is increasingly being used in the medical and statistical literature [8,33]. However, this approach remains novel in the context of antimalarial research [34]. The K-M method is the currently recommended approach for deriving antimalarial drug efficacy of uncomplicated P. falciparum malaria. Theoretically, the K-M method overestimates the cumulative incidence of recrudescence in the presence of new infection [17]. The magnitude of this overestimation is currently not documented and the implications for comparative efficacy studies is unknown. In order to fill this research gap, we carried out two simulation studies using biologically plausible survival functions consistent with the underlying pharmacokinetics profile of the antimalarial drugs.
The first simulation study quantified the degree of overestimation in cumulative incidence of recrudescence using the naïve 1 minus K-M method compared to the CIF in a single-armed antimalarial trial. The magnitude of the overestimation was found to increase with the increasing proportion of recrudescence, new infection and study follow-up duration; a finding consistent with the statistical and medical literature [16,17]. The simulation study suggested that the estimates from the two approaches differed by less than 0.1% for most of the scenarios presented in Table 2; such differences are unlikely to have clinical consequences. In a scenario which reflected the current observations of drug efficacy with artemisinin combination therapies (> 95%), the overestimation was negligible in the areas of low transmission intensities, i.e. new infections lower than 10% (Table 2). For high transmission areas, this reached a maximum of 1.75%. However, we have also clearly identified several scenarios where the two methods will lead to a substantially different estimate. The magnitude of the overestimation was greatly increased when antimalarial drug efficacy began to decline. At 90% drug efficacy, the absolute deviation in derived estimates reached a maximum of 0.27% in the areas of low transmission and 3.13% for high transmission areas. When the efficacy fell to the low level of 85%, the overestimation reached 4.30% in the areas of high transmission. Similarly, in antimalarial studies, additional treatment is administered on detecting a recurrent parasitaemia. In such a scenario where the recurrence is due to a new infection, which has masked an existing low-density parasitaemia of the original infection (recrudescence), this would prevent the potential recrudescence from being observed due to additional antimalarial drugs. This will lead to an underestimation of failure. Taken together, our results highlight that estimation of drug failure in areas of high transmission requires careful attention and the CIF provides an alternative approach for deriving the failure estimates.
The second simulation study explored the results from the log-rank test for comparing the cause-specific hazard rates and Gray's k-sample test for comparing the cumulative incidences in comparative drug trials. A total of nine different hypothetical scenarios on how a new drug B might affect the recrudescence and new infection compared to an existing drug A were explored (Table 1). There were contrasting differences in two out of the nine scenarios. When drug B, compared to drug A, was associated with increased (or decreased) risk of both recrudescence and new infection, we found that log-rank test was more powerful compared to Gray's k-sample test for detecting differences between the two treatments. However, when drug B had higher risk of recrudescence and lower risk of new infection (or vice versa) compared to drug A, then Gray's k-sample test was more powerful in detecting the differences between the two drugs in terms of primary endpoint (Table 3). This finding is consistent with the results reported by two previous simulation studies in statistical literature [18,30]. However, it must be stressed that the latter scenario is less likely to be observed within the context of comparing antimalarial regimens in a real-life situation.
Our simulation study has a number of methodological limitations. First, time to recrudescence and new infection were generated assuming independence. While this greatly simplified the simulation settings, this is an assumption unlikely to be verified and carrying out simulation studies accounting for correlation between recrudescence and new infections remained beyond the scope of this work. Second, we assumed no losses to follow-up for simplicity. A loss to follow-up of approximately 20% is anticipated in antimalarial studies and this can be incorporated in the simulation studies as future work. Third, when simulating time to recrudescence, we used rejection sampling and kept the first 1000 observations with 4-6%, 9-11% and 14-16% recrudescence for the scenarios of 5, 10 and 15% recrudescence, respectively. This approach might have led to less variability between the 1000 simulated datasets. Fourth, in simulation study II, we simulated data based on reference drug A assuming low failure in the areas of low transmission (2.5% recrudescence and 21.4% new infections). Hence, the generalisability of results for comparative studies in areas of different transmission settings might be limited. And finally, this manuscript has focused on the point estimation of the derived failure estimates. However, we would like to emphasise that the uncertainty around the point estimates (associated 95% confidence interval) be given as equal importance as the point estimate.
Our results have important clinical consequences. The current WHO strategy for monitoring and evaluation of antimalarial drug efficacy uses a series of threshold-based approaches. For new drugs to be eligible for introduction as a first line treatment, derived failure estimates should be less than 5%, and for current first line treatments, the failure estimates should not exceed 10% [35]. The results presented in Fig. 4 highlighted the implications for drug policy usage when the derived estimates are at the cusp of these thresholds. The derived estimate of cumulative failure was greater than 5% (Fig.  4a) and 10% (Fig. 4b) when the K-M method was used, but remained below 5 and 10% respectively when using the competing risk survival analysis approach, i.e. the CIF. This highlights that ignoring the competing risk of new infections can result in potentially misleading conclusions being drawn from a clinical study, particularly in high transmission settings where a large fraction of patients may develop new infections during the follow-up period, thus confounding the derived efficacy estimates. Similarly, the effect of competing events has implications for not only standalone trials but also comparative drug trials, particularly when the partner component of the artemisinin combination therapies are eliminated at different rates. For example, lumefantrine, the partner drug in artemether-lumefantrine (AL), has an elimination half-life of 4 days and hence almost all antimalarial activity is sub-therapeutic within 16 days [36]. Conversely the elimination half-life of piperaquine (partner drug in dihydroatemsinin-piperaquine (DP)) is four weeks and it exerts prolonged post treatment prophylaxis, reducing the risk of recurrent infections for up to 42 days [36]. Hence, the observed proportion of competing risk events is expected to be significantly lower following DP compared to AL, especially in the areas of high transmission. When a large fraction of patients develop new infections, fewer patients are available from which recrudescences can be observed. Hence, it is important that the proportion of competing risk events be taken into consideration when comparing two regimens with different pharmacological properties.
There is an ongoing debate in medical and statistical literature regarding the choice of the method for comparing treatment regimens in the presence of competing risk events [19,30,[37][38][39]. It is increasingly being advocated that if the research interest is in understanding the biological mechanism of how a treatment affects hazard rate, the log-rank test is considered the appropriate method. However, when the interest is in comparison of overall risk i.e. if individuals receiving a particular drug are more likely to experience recrudescence, the comparison of CIF through Gray's k-sample test is considered appropriate [17,40,41]. Many authors advocate presenting results of both these approaches to provide a complete biological understanding of the treatment on different endpoints [17,42]. It is important that researchers are aware that the choice of the analytical method in the presence of competing risk events should be guided by the research question of interest.

Conclusions
Our simulation study showed that 1 minus K-M method led to an overestimation of cumulative antimalarial treatment failure compared to the CIF and the degree of overestimation was far greater in high transmission areas. In the areas where a large proportion of recurrences are attributable to new infections, the use of CIF should be considered as an alternative approach for the derivation of failure estimates for antimalarial studies. For comparative studies of antimalarial treatments, the choice of the statistical test should be guided by whether the rate or cumulative risk of recrudescence is the outcome of interest.