 Research
 Open access
 Published:
Comparison of statistical methods for the analysis of recurrent adverse events in the presence of nonproportional hazards and unobserved heterogeneity: a simulation study
BMC Medical Research Methodology volume 22, Article number: 24 (2022)
Abstract
Background
In preventive drug trials such as intermittent preventive treatment for malaria prevention during pregnancy (IPTp), where there is repeated treatment administration, recurrence of adverse events (AEs) is expected. Challenges in modelling the risk of the AEs include accounting for timetoAE and withinpatientcorrelation, beyond the conventional methods. The correlation comes from two sources; (a) individual patient unobserved heterogeneity (i.e. frailty) and (b) the dependence between AEs characterised by timedependent treatment effects. Potential AEdependence can be modelled via timedependent treatment effects, eventspecific baseline and eventspecific random effect, while heterogeneity can be modelled via subjectspecific random effect. Methods that can improve the estimation of both the unobserved heterogeneity and treatment effects can be useful in understanding the evolution of risk of AEs, especially in preventive trials where timedependent treatment effect is expected.
Methods
Using both a simulation study and the Chloroquine for Malaria in Pregnancy (NCT01443130) trial data to demonstrate the application of the models, we investigated whether the lognormal shared frailty models with restricted cubic splines and nonproportional hazards (LSFNPH) assumption can improve estimates for both frailty variance and treatment effect compared to the conventional inverse Gaussian shared frailty model with proportional hazard (ISFPH), in the presence of timedependent treatment effects and unobserved patient heterogeneity. We assessed the bias, precision gain and coverage probability of 95% confidence interval of the frailty variance estimates for the models under varying known unobserved heterogeneity, sample sizes and timedependent effects.
Results
The ISFPH model provided a better coverage probability of 95% confidence interval, less bias and less precise frailty variance estimates compared to the LSFNPH models. The LSFNPH models yielded unbiased hazard ratio estimates at the expense of imprecision and high mean square error compared to the ISFPH model.
Conclusion
The choice of the shared frailty model for the recurrent AEs analysis should be driven by the study objective. Using the LSFNPH models is appropriate if unbiased hazard ratio estimation is of primary interest in the presence of timedependent treatment effects. However, ISFPH model is appropriate if unbiased frailty variance estimation is of primary interest.
Trial registration
ClinicalTrials.gov; NCT01443130
Background
The development of a comprehensive drug safety profile in randomized controlled trials (RCTs) requires statistical methods that can adequately characterize the adverse event (AE) occurrence process (e.g. time of onset, recurrence and duration) [1]. Traditionally, analysis of AEs in IPTp trials, like all clinical trials, is predominantly descriptive and nonparametric such that advanced methods (e.g. parametric and semiparametric) are rarely used [2,3,4]. The use of descriptive and nonparametric methods can lead to poor estimates of the effect of treatment on AE recurrence. The descriptive and nonparametric methods may not account for essential aspects of the data structure such as censoring, potential timevarying treatment effect and the AEdependence. For example, mean cumulative function is the simple and popular nonparametric method used to facilitate the development of trajectories of AEs profiles over the study period, where a patient may experience more than one AEs [5]. However, the method fails to explicitly account for the potential dependence of the AEs or individual patient heterogeneity. The utility of the existing recurrent event methods towards the AEs analysis has been recently considered [6]. However, adopting and interpreting such recurrent events methods results in drug safety assessment setting remains a challenge. This motivates a need for further investigations on the clinical value and practical applicability of the advanced methods [7]. In the framework of benefitrisk assessment, two models (the Andersen Gill model and Prentice, Williams and Peterson model) have been demonstrated to be useful in providing both direct and indirect effects of treatment on AE recurrence [6]. However, these models yield unbiased estimates if the AEs are uncorrelated and do not efficiently capture (i.e. account for) both potential timedependent treatment effects and unobserved heterogeneity. This motivates the need to consider using flexible parametric shared frailty models that can optimally capture both the timedependent effects and unobserved heterogeneity, to improve drug safety estimates.
In IPTp clinical trials, the potential of recurrent AEs is high because the preventive antimalarial drugs are repeatedly administered. Recurrent AEs may induce withinpatient correlation that originates from two sources: the individual patient unobserved heterogeneity and the dependence between events [8]. The individual patient heterogeneity cannot be explained by the observed/known factors [9, 10], while the betweenAEdependence can be explained based on the observed AEs. The unobserved heterogeneity arises from unmeasured information that partly explains the treatment effect. For example, in IPTp trials, unobserved heterogeneity can arise from the unmeasured inherent characteristics (e.g. physiological changes associated with foetus development). The underlying individual patientspecific unobserved heterogeneity can influence the susceptibility of patients to AE recurrence. In IPTp trials, as the case with some drug trials, the treatment effect depends on followup time. Such treatment effect that is a function of followup time is called timevarying or timedependent treatment effect. In the presence of the timedependent treatment effect, the relative effect of treatment changes over the followup time such that the proportional assumption is violated (on log hazard scale). Ignoring the timedependent effects can lead to biased estimates. Therefore, accounting for the potential timedependent treatment effects in the analysis of recurrent events is necessary. However, in practice, identifying a model that can account for both unobserved heterogeneity and timedependent effects can present a challenge. Since the recurrence of AEs can be a reflection of both timedependent treatment effects and underlying individual patient unobserved heterogeneity. Therefore, it is vital to consider models that account for both issues.
Currently, shared frailty is one of the prevalent models for the analysis of recurrent events. This model can account for both the observed and unobserved heterogeneity. However, the common current applications of the shared frailty model assume that the treatment effect estimate (i.e. hazard ratio) is constant over the followup time [11]. In preventive drug trial settings, where there is repeated treatment administration and a long patient followup time, the proportional hazard assumption can be invalid when treatment hazard rates, between/across arms, for recurrent events can vary over the followup time. For example, in the context of IPTp drug safety assessment, more recurrent AEs are expected close to the day(s) of taking the days. The potential presence of timedependent treatment effects can pose a challenge in applying the shared frailty model to analyse recurrent events since it becomes complicated to separate timedependent effects from unobserved heterogeneity [12].
Recent developments in flexible parametric survival models can effectively capture both timedependent effects and unobserved heterogeneity [13]. Researchers have established, mainly in univariate survival data context, that flexible parametric survival models with restricted cubic splines can accurately model timedependent effects [14, 15]. However, there is limited literature investigating the efficiency and utility of such methods in the context of multivariate survival data and drug safety assessment. Within multivariate survival data setting, Gasparini et al. recently compared the standard and splinebased shared frailty models under varying heterogeneity and baseline hazard specification assuming proportional hazards [16]. The shared parametric models with splines emerged to be more robust to model misspecification than the standard approaches. However, their work is limited to the context of the proportional hazards assumption. As highlighted above, a challenge arises in applying the shared frailty model when there are timedependent effects. Using the standard semiparametric shared frailty models, Balan and Putter demonstrated that the bias of frailty variance estimate, in the presence of timedependent effects, can be mitigated as the cluster size (i.e. number of recurrent events within an individual) increases [12]. Based on the available literature, it is not clear whether the use of flexible parametric shared frailty with splines models can improve the frailty estimates compared to the standard shared frailty models in the presence of both timedependent treatment and unobserved heterogeneity.
Our work aims at investigating whether the flexible parametric shared frailty model with nonproportional hazards and restricted cubic splines can improve both the frailty variance and hazard ratio estimates compared to the standard parametric shared frailty models, in the presence of both unobserved heterogeneity and timedependent treatment effects. The optimal shared frailty model can efficiently capture both unobserved heterogeneity and timedependent effects. We evaluated the following models; (a) inverse Gaussian shared frailty (ISFPH) model with proportional hazards and Weibull baseline hazard (b) Lognormal shared frailty model with nonproportional hazard and restricted cubic splines where we model the baseline hazard function with 1 degree of freedom and model the timedependent treatment effect with 1 degree of freedom (LSFNPHS1) and (c) Lognormal shared frailty model with nonproportional hazard and restricted cubic splines where we model the baseline hazard function with 3 degrees of freedom and model the timedependent treatment effect with 1 degree of freedom (LSFNPHS3). We compare the models using a simulation study assuming an application area of recurrent AEs in IPTp trials. Understanding the statistical properties would enhance the informed choice of statistical methods for the analysis of recurrent AEs. The simulation offers an opportunity to effectively understand the properties of the statistical models since the true data distribution is known, unlike in a practical clinical trial setting. We present the remaining sections of the paper as follows: In the methods section, we provide a detailed review of the models under study, outline the simulation study design and describe the analysis approach. In the results section, we present the findings of our simulation study and apply the models to the analysis of recurrent AEs observed in Chloroquine for Malaria in pregnancy trial (NCT01443130). In the discussion section, we discuss the key findings of our work and provide a conclusion.
Methods
This section outlines notation and provides an overview of the three models that were compared in our simulation study. Specifically, we briefly introduce and discuss the general inferential methodological framework of the models, focussing on model specification, parameter estimation and respective assumptions. The presentation of this section is customized to the context of drug safety assessment in clinical trials. For readers interested in detailed concepts, on recurrent events analysis, they can consult excellent introductory books on recurrent events analysis and frailty models methodology [10, 17, 18], written for a wider audience.
Notation
We assume that the observation begins at enrolment, denoted as time t = 0. Let T_{ik} be the total time of the k^{th} event for the i^{th} patient where i = 1, 2, ………n, k = 0, 1, 2, ……. p and we assume T_{i1} < T_{i2} < … < T_{ik}. The AEs are assumed to be recorded over a period of time interval [0, τ] such that the total number of AEs recorded per patient i over that time is \(=\sum_{k=0}^{\infty }I\left({T}_{ik}\le t\right)\) . The history of the recorded number of events over given time t, denoted as N(t), 0 ≤ t, is defined as a counting process; this forms the basis for easytoimplement recurrent events analysis. The frailty is defined as φ_{i} and interpreted as an unobserved covariate common for all recurrent AEs within the patient. Covariate and regression coefficients vectors are defined as X_{ik} and β respectively. The covariates from the covariates vector are considered to be fixed. Under the simulation study design described below, one covariate is considered (i.e. treatment). For the nonproportional hazard models, the treatment effect is allowed to be dependent on time (i.e. β(t)). These are further discussed in context in the section below.
Statistical methods for recurrent events
Recurrent events modelling is popularly based on three main approaches; event counts, event intensity function and joint distribution of gap times between successive events. Event countsbased models are suitable when there are frequent events and event occurrence does not impact the occurrence of the subsequent events [17, 19]. In the context of AEs, some mild AEs may not alter the event process (e.g. mild headache). When events are infrequent and the prediction of the next event is of interest, gap times methods are appropriate [20, 21]. To ensure that the event process history is appropriately incorporated, the intensity function is used; the function defines the instantaneous probability of an event occurring at time t given the event process history [17].
Where ∆N(t) is the number of events in the time interval [t, ∆t), H(t) is the history of the event process at time t. Since event counts (i.e. Poisson process) modelling approach assumes that the events are independent such that the event history process at t, H(t), does not affect the instantaneous probability of an event at time t, the intensity function is reduced to the rate function ρ(t) below;
This implies; ∆tρ(t) = E(∆N(t)). Hence if we let the expected cumulative rate at time t to be μ(t), we can derive the expected cumulative rate by integrating the rate function over the interval [0, t).
The gap/waiting times method is another alternative approach that focusses on modelling time between consecutive events. Methods based on gap time are suitable mainly in setting where the occurrence of the events is considered relatively infrequent [17]. The work in this paper is confined to multiplicative intensitybased models because it permits deriving and interpretation of failure intensity relative to the treatment under investigation. Shared frailty models are considered as important intensitybased models for the analysis of recurrent events that can account for both measured and unmeasured heterogeneity.
The shared frailty models
In practice, the common approach to account for unobserved heterogeneity in recurrent events analysis is to use a robust variance (i.e. robust sandwich covariance matrix is used in estimating the log hazard ratio). Adjusting for withinsubject unobserved heterogeneity using robust variance has been shown to be inadequate in some settings [22]. One of the solutions to this is explicitly modelling the correlation using patientspecific random effect. Such a model is defined as a frailty model. Frailty models are hazard models having a multiplicative frailty factor [10]; these are conditional models. Overall, the model has frailty, linear predictor and the baseline hazard function as the multiplicative factors. The random effect introduced in frailty models describes the excess risk that cannot be explained by the measured/observed variables [10]. The unmeasured observations may introduce heterogeneity between the analysis units e.g. for recurrent AEs, observations within a patient may be more similar compared to different patients. Hence frailty model attempts to account for the withinpatient correlation. Ignoring the potential heterogeneity in modelling recurrent events can lead to underestimation of treatment effect [23, 24].
For recurrent events, when the frailty is modelled as a common random effect for all events within a patient, it is defined as a shared frailty model (see model M4). The dependence accounted for in the shared frailty model is only due to the unobserved individual heterogeneity, φ_{i}. This measure of the extent of heterogeneity among the patients can follow several distributions [18]. Gamma and inverse Gaussian distributions are the common frailty distribution specifications, possibly due to their mathematical tractability and software availability. Across all the patients, the respective random effects are assumed to be independent and identically distributed.
If the baseline hazard, λ_{0}(t_{ij}), is unspecified, the model is called a semiparametric frailty model. Specifying the baseline hazard function ensures that an appropriate shape of the hazard function is captured. However, missspecification of the baseline hazard distribution may induce bias on regression coefficients [16]. Therefore, caution needs to be exercised on the choice of the baseline hazard distribution by paying attention to the most clinically plausible hazard function. Traditionally, the hazards are assumed proportional over the whole followup time. Within the flexible parametric survival methods framework, the proportional hazards assumption can be relaxed to accommodate timedependent treatment effects (i.e. nonproportional hazards [13]). This can be implemented by interacting treatment with splines of log time as part of the linear predictor. Fitting such flexible nonproportional shared frailty model on log hazard scale is helpful since it can effectively accommodate interpretation and presentation of results. The shared frailty model with nonproportional hazards and restricted cubic splines can be written as;
The treatment x_{ik} is interacted with restricted cubic function of log time, s{log(t) γ, k_{0}}, such that γ is parameter vector and k_{0} is knot vector. Similarly, s{log(t) δ_{f}, k_{f}} represents a spline function of log time for the f^{th} timedependent effect with parameter vector δ_{f} and knots vector k_{f}. Since the number of restricted cubic splines can determine the most appropriate model to use we considered to evaluate two flexible parametric shared frailty models as follows;

(i)
LSFNPHS1: We considered a shared frailty model assuming lognormal frailty distribution and nonproportional hazards with restricted cubic splines where the baseline hazard function is modelled using 1 degree of freedom and the timedependent treatment effect is modelled using 1 degree of freedom.

(ii)
LSFNPHS3: We also considered a shared frailty model was fitted assuming lognormal frailty distribution and nonproportional hazards with restricted cubic splines where the baseline hazard function is modelled using 3 degrees of freedom and the timedependent treatment effect is modelled using 1 degree of freedom.
Generally, under shared frailty models, depending on the model specification, parameter estimation can be done using expectationmaximization algorithm, maximum likelihood penalized partial likelihood and Markov chain MonteCarlo methods [18]. Since in this paper we focus on parametric shared frailty models from a Frequentist perspective, the parameter estimation is based on maximum likelihood estimation. It should be noted that in drug clinical trials not all patients may experience the AEs (i.e. some patients are censored). Censoring in this paper is considered independent of the AE occurrence. Specifically, the model parameters can be estimated by maximizing marginal loglikelihood as shown below; Let Z be the vector containing the observed patientspecific information such that
where y_{ij}, ξ_{ij} and x_{ij} are time to AE occurrence, AE occurrence indicator (1 for AE occurrence and 0 otherwise) and fixed covariate for patient i and jth AE occurrence, respectively. Let U be a vector containing the latent information (the frailty terms) for n patients.
therefore, the marginal loglikelihood of the observed data Z can be written as:
where d_{i} represents the total number of events for patient and \({\mathcal{L}}^{(v)}(.)\) represents the v^{th} derivative of the Laplace transform for the derivative. Availability of the Laplace transform simplifies the maximization of the loglikelihood in order to compute the desired model parameters. Under the timedependent treatment effect, the fixed treatment effect β can be replaced by the timedependent treatment effect β(t).
Simulation study
This simulation study was conducted and reported based on “Aims, Data generating process, Method of analysis, Estimands and Performance” approach [25] since it provides a scientifically coherent structured framework for designing, interpreting and reporting simulation studies.
Aim
There is a dearth of literature highlighting the performance of recurrent statistical methods when both AEdependence and heterogeneity is accounted for in analysis of recurrent AEs in clinical trials. The aim of the simulation study was to evaluate whether the use of Lognormal shared frailty model with restricted splines and nonproportional hazards improves both the frailty variance and hazard ratio estimates in the presence of timedependent effects and unobserved heterogeneity under varying unobserved heterogeneity levels and sample sizes. The performance of this flexible model was compared against the standard proportional hazardsbased inverse Gaussian shared frailty model with proportional hazard and Weibull baseline hazard distribution assumptions.
Data generating mechanisms
The data generating mechanism (DGM) in this simulation study focussed on the specification and estimation of frailty variance and hazard ratio in the presence of timedependent treatment effects and unobserved heterogeneity (Table 1). For each scenario, we considered 1000 simulations, lognormal frailty distribution, Weibull baseline hazard, fixed effect treatment effect of log hazard ratio 0.5 and up to a maximum of 4 AEs experienced by a patient. The choice of possible maximum of 4 AEs per patient was aimed at mimicking a setting for moderately high occurrence of recurrent AEs, in IPTp trials. The choice of lognormal frailty distribution was driven by the mathematical tractability of the distribution, availability of software for implementation and interpretability of the results in the context of AEs. The DGMs varied sample size (n = 300, n = 500), frailty distribution variance (0.25, 0.50, 0.75) and timedependent effect (0.03, − 0.03). The choice of the sample size was intended to reflect sample sizes that are commonly employed in most IPTp trials. Choice of frailty variance (i.e. a measure of the strength of the unobserved heterogeneity within an individual) intended to represent a setting 0.25, 0.5 and 0.75 represented weak variance, moderate variance and heavy variance respectively. The timedependent treatment effect allowed the fixed effect treatment effect of log hazard ratio 0.5 to decay over time (when set at − 0.03) or increase over time (when set at 0.03) or remain constant when set at 0.0). All the DGMs were parametrically based, to accommodate the investigation of several scenarios unlike resampling that confines investigation to data with properties only for the application data under study [25]. Hence, we simulated recurrent AE data similar to common scenarios that are practically and theoretically expected in IPTp trials. In all the simulation scenarios, the data generation mimicked a randomized controlled trial with two treatment arms. In all scenarios under study, we assumed that there were no substantial competing risks observed during the trial such that censoring was considered independent. The individual followup times were simulated based on uniform distribution and maximum followup time of 12 months (i.e. administrative censoring was set at 1 year), using Survsim package in stata. The choice of 12 months of followup time was motivated by the fact that most IPTp trials followup the mother for 12 months from enrolment to beyond delivery.
The hazard functions were simulated using inversion method, under Weibull baseline hazard function, accommodating both timevarying treatment effect and unobserved heterogeneity with lognormal frailty distribution. The inversion method enabled sampling from a given distribution using uniform distribution and quantile function [26] where uniform distribution was assumed to have minimum 0 and maximum of 1. An overview of the investigated scenarios appears in Table 1 below.
During the data generation process, convergence of the models was also continuously monitored. After the data generation descriptive summary statistics were also done to check for any abnormal observations. For each scenario, graphical exploration of the distribution of the estimates was done to ascertain the assumed underlying distribution of the generated data.
Simulation study targets
Of primary interest in this study was shared frailty models` ability to effectively capture the unobserved heterogeneity. Therefore, the primary estimand of interest was log frailty variance φ_{i} for each of the investigated models. Since our second interest was to assess the ability of the models in capturing timedependent treatment effect, the log hazard ratio β computed at 3 months of followup time was considered as a secondary estimand. We opted for log hazard ratio at the specific followup time in order to ensure that the estimates from both the proportional and nonproportional hazard models are comparable.
Methods for data analysis
Each simulated data set was analysed using the three methods under investigation: ISFPH, LSFNPHS1 and LSFNPHS3. Based on our simulation scenarios, the ISFPH model missspecified the proportionality of the hazards although it correctly specified the baseline hazard function (i.e. the baseline hazard function for the ISFPH was modelled using Weibull distribution). The LSFNPHS1 and LSFNPHS3 models correctly specified the proportionality of the hazards although the baseline hazard function is approximated by the restricted cubic splines. When there are nonproportional hazards, reporting of a single average HR is considered not practically useful. Therefore, in order to ensure that the performance measures for the HR estimates are comparable across the three models, we computed hazard ratio at 3 months of followup time since enrolment (i.e. LSFNPHS1 and LSFNPHS3 are nonproportional hazard models and ISFPH is a proportional hazard model). The choice of the time at which the HR was calculated was motivated by our practical experience where most patients take approximately half of their scheduled IPTp doses by third month of followup time.
The three shared frailty models (ISFPH, LSFNPHS1 and LSFNPHS3) are also used in the analysis of recurrent adverse events that were observed in the Chloroquine for Malaria Prevention in Pregnancy trial (NCT01443130) data. The motivating data description, data analysis and results appear in the application section of this paper, after simulation results presentation.
Performance measures
In this paper, our model performance measures of interest were defined as described in Morris et al [25]. Bias and MSE were main performance measures for our estimands. For frailty variance estimand, we assessed the model performance based on precision, bias, MSE and coverage probability of 95% confidence interval of the frailty variance estimates. For the hazard ratio estimand, we assessed the model performance based on precision, bias and MSE. Point estimates of performance are reported including Monte Carlo standard errors in order to highlight the certainty of our point estimates.
We used bias in order to quantify whether the estimator of interest targeted the true value of interest. Bias is defined as the difference between the expected value of an estimator and the true parameter value. Secondly, in order to integrate bias and variance of an estimator into one performance measure, we computed the mean square error (MSE). MSE is defined as sum of the squared bias and variance of the estimator. In defining relative precision gain, ISFPH model was considered as a reference model. Therefore, precision gain is defined as the inverse squared ratio of the empirical SE of the respective flexible parametric model (i.e. LSFNPHS1 or LSFNPHS3) to the empirical SE of the ISFPH model. Additionally, coverage was also computed, defined as proportion of confidence intervals for an estimator that contain the true parameter value.
Software and computer hardware specification
The simulation study was conducted using userwritten survsim and merlin packages in Stata version 15.1 [27,28,29]. The analyses of the simulated and application data were also done using streg builtin function in Stata for the ISFPH model.
The simulation study was performed using the Stata version 15.1 that was installed in a purposelybuilt Threadripper 3990X GT 1030 workstation, equipped with a Threadripper 3990X (288 MB Cache, 64x Cores, 4.3GHz) turbo processor, MSI TRX40 PRO 10G AMD Ryzen Threadripper Motherboard, Nvidia GeForce RTX 2070 SUPER OC Ed. 8GB GDDR6 GPU, Corsair Vengeance PRO RGB 3200 MHz 64 GB HighPerformance Gaming, Hikvision E2000 1 TB M.2 SSD that reads up to 3.5 GB/s and a 4 TB HighPerformance Hard Disk Drive (HDD).
Ethical considerations
The Chloroquine for malaria in pregnancy trial was conducted in accordance with the Declaration of Helsinki and Good Clinical Practice guidelines. The trial obtained ethical approval from the Institutional Review Board at the University of Maryland, the College of Medicine Research and Ethics Committee at the University of Malawi and the Malawi Pharmacy Medicines and Poisons Board. The current study was approved by the University of the Witwatersrand’s Human Research Ethics Committee.
Results
Frailty variance estimates coverage probability, bias and MSE across the shared frailty models
Based on 1000 simulations for each scenario, we observed a 100% convergence rate of all the models that were fitted. Across all the scenarios, the magnitude of bias for the frailty variance estimates increased as the known unobserved heterogeneity variance increased from 0.25 to 0.75 whether the sample size was 300 or 500 (Table 1). The observed bias across all the scenarios was negative suggesting underestimation of the unobserved heterogeneity. The magnitude of bias for the frailty variance estimates were slightly higher when the sample size was 300 compared to the sample size of 500. Overall, the coverage probability of the frailty variance estimates consistently decreased as the known unobserved heterogeneity variance increased from 0.25 to 0.75. The coverage probability across all the models was slightly above 95% for scenarios where the unobserved heterogeneity variance was 0.25, suggesting some negligible overcoverage. For example, for scenario 1, coverage (MCSE) was 96.40 (0.5891), 95.90 (0.6270) and 96.20 (0.6046) for ISFPH, LSFNPHS1 and LSFNPHS3 models respectively. Overall, substantial undercoverage was observed for scenarios where the unobserved heterogeneity variance was 0.50 or 0.75 among the flexible parametric models (LSFNPHS1 and LSFNPHS3). The undercoverage worsened as the sample size increased from 300 to 500. Such undercoverage suggested inaccuracy of the frailty variance estimates obtained using the LSFNPHS1 and LSFNPHS3 models.
As expected, overall, among the nonproportional hazard models with restricted cubic splines, the LSFNPHS3 had lower bias and higher coverage of the frailty variance estimates compared to the LSFNPHS1. This result suggests that for the nonproportional hazard models with restricted cubic splines, increasing the number of degrees of freedom modelling the baseline hazard function from 1 to 3 improves the frailty variance estimates. For the scenarios where the known unobserved heterogeneity was 0.50 or 0.75, we observed a higher magnitude of bias of frailty variance estimates for the nonproportional hazard models with restricted cubic splines models compared to the ISFPH model. For the scenarios where the known unobserved heterogeneity variance was 0.25, we observed the lowest magnitude of bias for the frailty variance estimates in LSFNPHS3 model followed by ISFPH model then LSFNPHS1 model.. For example, under scenario 1, the frailty variance estimate bias (MCSE) was − 0.0389 (0.0064), − 0.0556 (0.0060), − 0.0263 (0.0061) for ISFPH, LSFNPHS1 and LSFNPHS3 models respectively. Pertaining to coverage, although the coverage probability decreased with increase in unobserved heterogeneity variance, decreasing the sample size from 500 to 300 yielded results with higher coverage. The decrease in coverage probability for the frailty variance estimates with increased unobserved heterogeneity was more drastic among the nonproportional hazards with restricted cubic splines models (LSFNPHS1 or LSFNPHS3) compared to the ISFPH model. Overall, the ISFPH model consistently yielded coverage probability closer to the nominal level of 95% compared to the nonproportional hazards with restricted cubic splines models (LSFNPHS1 or LSFNPHS3).
Overall, the MSE for the ISFPH model frailty variance estimates were lower compared to the LSFNPHS1 and LSFNPHS3 models, except when the known unobserved heterogeneity variance was 0.25. Across all the 18 scenarios studied, we observed that the MSE estimates for the ISFPH model decreased as the unobserved heterogeneity variance increased. However, the MSE estimates for the LSFNPHS1 and LSFNPHS1 models increased as the unobserved heterogeneity variance increased: Notably, the MSE for the LSFNPHS3 model yielded lower MSE estimates compared to the LSFNPHS1 model. As expected, increasing the sample size from 300 to 500 reduced the MSE estimates for the corresponding scenarios.
Our investigation showed that LSFNPHS1 and LSFNPHS3 models were more efficient than the ISFPH model as demonstrated by the lower mean of modelbased standard errors (Fig. 1). The difference in modelbased standard errors between that LSFNPHS1 and LSFNPHS3 models was negligible. As expected, increasing sample size from 300 to 500 improved the efficiency of the shared frailty models under study such that the modelbased standard errors for the frailty variance estimates decreased.
Percentage gain in precision for frailty variance estimates of nonproportional hazards shared frailty models with restricted cubic splines relative to ISFPH model
Overall, as the unobserved heterogeneity increased, percentage gain in precision for frailty variance estimates of nonproportional hazards shared frailty models with restricted cubic splines relative ISFPH model also increased (Table 1). The gain in precision was higher when timedependent treatment effects were increasing (i.e. TDE = 0.03) compared to the decaying timedependent treatment effects (TDE = 0.03). Across the models, the gain in precision was higher in the LSFNPHS1 model compared to LSFNPHS3 model. The observed gain in precision for LSFNPHS1 and LSFNPHS3 models compared to ISFPH model under proportional hazard scenarios (i.e. TDE = 0) was similar to the scenarios when timedependent treatment effects were increasing (i.e. TDE = 0.03).
Bias and MSE for the estimated log hazard ratio estimates across the shared frailty models under varying unobserved heterogeneity and sample size
Overall, across all the three models under study, the MSE for the HR estimates increased as the unobserved heterogeneity variance increased (Table 2). The ISFPH model consistently yielded lowest MSE estimates for the HRs across the three models regardless of the nature of the timedependent treatment effect. Increasing the sample size from 300 to 500 also led to reduction in the estimated MSE for the HRs. We observed a loss in precision of HR estimates for the ISFPH model compared to the LSFNPHS1 and LSFNPHS3 model. The precision loss ranged from 9 to 20%. The highest precision loss was observed when the unobserved heterogeneity was low (i.e. frailty variance equal to 0.25). The precision loss estimates were similar for both LSFNPHS1 and LSFNPHS3 models. There was low precision loss in the absence of timedependent treatment effects (TDE = 0). In the presence of timedependent treatment effects (i.e. TDE = 0.03 and TDE = 0.03), we observed higher magnitude of bias in the ISFPH model compared to the LSFNPHS1 and LSFNPHS3 models. In the absence of the timedependent treatment effects, the bias for the HR estimates was lower in the ISFPH model compared to the LSFNPHS1 and LSFNPHS3 models. However, in the absence of the timedependent treatment effects, the bias across all the three models was relatively lower (i.e. under 0.01) compared to scenarios when timedependent effects are present.
Example data application: Chloroquine for malaria in pregnancy trial
We illustrated the application of the three investigated shared frailty models above (ISFPH, LSFNPHS1 and LSFNPHS3) to data on recurrent AEs collected from a clinical trial evaluating antimalarial drugs to prevent malaria in pregnancy. The trial is called Chloroquine for malaria in pregnancy.
The current modelling focussed on investigating the effect of treatment on recurrent AEs. We included a population of 600 pregnant women in their first or second trimesters who were randomized to receive IPTp of (a) sulfadoxinepyrimethamine (SP, two doses, 4 weeks apart), (b) chloroquine (CQ, 600 mg on day 1, 600 mg on day 2, and 300 mg on day 3 two days apart). Although the original trial had three treatment arms, in the current analysis, we dropped the arm of CQ prophylaxis (to focus on IPTp arms) since its treatment schedule was radically different from the IPTp arms. Full details of the trial are published elsewhere [30].
Of the 600 pregnant women analysed in the current analysis, 474 (79%) experienced at least one AE. Cumulatively, more AEs were experienced in the CQ arm (895 AEs) than those in the SP arm (650 AEs). We fitted ISFPH model followed by the LSFNPHS1 and LSFNPHS3 models where IPTp treatment was an exposure of interest. The SP treatment was considered as a reference treatment in the interpretation of the results. ISFPH assumed the baseline hazard function followed Weibull distribution, unobserved heterogeneity followed inverse Gaussian distribution and the hazard rates between the CQ and SP treatments were proportional over the followup time. The nonproportional hazard models with restricted models (NPHS1 and LSFNPHS3) assumed that unobserved heterogeneity followed lognormal distribution and timedependent effects could be modelled using one degree of freedom. NPHS1 and LSFNPHS3 approximated the baseline hazard function using 1 and 2 degrees of freedom respectively. We fitted adjusted and unadjusted model for all the three model under investigation. The adjusted model included the following covariates: treatment, maternal age, maternal body mass index and gravidity (i.e. whether the pregnant woman had previous pregnancy or not). We adjusted for the same covariates for the three models to ensure that the estimates were comparable. Since the results from the adjusted and unadjusted models were similar, our results reported in Table 3 below are based the unadjusted models since we considered them more parsimonious. The reported hazard ratios across the three models are based on marginal hazard ratio, predicted hazard ratio at 14 days and 60 days of followup. Additionally, we plotted the hazard ratio estimates over the followup time.
In this clinical example for recurrent AEs data analysis, the estimated marginal log hazard ratios were 0.2281 (0.0499)0.2328(SE: 0.0517), 0.3090 (0.0769), 0.3360 (SE:0.0850) and 0.3024(0.0746) for ISFPH, LSFNPHS1, LSFNPHS2, LSFNPHS3 and LSFNPHS4 models respectively, indicating an increased risk of AEs among the women who took the CQ treatment arm compared to those in the SP treatment arm (Table 3). At 14 days, the Log HR across the nonproportional hazard models were similar ranging between 0.43 to 0.47 and were higher compared to the ISFPH model. However, at 60 days only LSFNPHS2, LSFNPHS3 and yielded similar log HR estimates approximately 0.20 and were lower compared to the ISFPH model estimates. The frailty variance estimates generally higher among the nonproportional hazard models compared to the ISFPH model. Among the nonproportional hazards models with restricted cubic splines, the AIC estimates show that the model fit was good when the baseline hazard function was approximated with minimum of 2 degrees.
Discussion
The analysis of recurrent AEs in clinical trials rarely account for potential unobserved heterogeneity and timedependent effects [2]. Using a simulation study, we investigated the performance of three statistical methods for recurrent events that can be adapted towards the analysis of recurrent AEs in clinical trials, in the presence of both timedependent effects and unobserved heterogeneity. Our paper mainly focused on evaluating the ability of the flexible parametric shared frailty models with nonproportional hazards and restricted cubic splines to capture both the unobserved heterogeneity and timedependent treatment effects. We established that the inverse Gaussian shared frailty with proportional hazard model consistently yielded good coverage closer to the nominal level of 95%, lower bias and lower MSE of the frailty variance estimates at the expense of imprecision compared to the flexible shared frailty models with nonproportional hazards and restricted cubic splines. Interestingly, the frailty variance estimates coverage and bias, for the shared frailty models with nonproportional hazards and restricted cubic splines, improved upon increasing the number of degrees of freedom used to model the baseline hazard function. As expected, the flexible parametric shared frailty model with nonproportional hazards and restricted cubic splines yielded better hazard ratio estimates, reflecting the timedependent treatment effects, compared to the inverse Gaussian shared frailty with proportional hazard model. The flexible parametric shared frailty models with nonproportional hazards and restricted cubic splines yielded unbiased estimates of HRs at the expense of imprecision and inaccuracy (i.e. high MSE) compared to the ISFPH model. Therefore, our findings show that the more precise estimators were biased and imprecise estimators were unbiased. In drug safety assessment using methods that yield an unbiased hazard ratio would be of primary interest such that the flexible parametric shared frailty model should be considered if there is the presence of both unobserved heterogeneity and timedependent effects. However, such a decision should be made at the cost of imprecision and loss of accuracy. Hence, we emphasize that the flexible shared frailty with nonproportional hazards and restricted cubic splines should be applied only when the proportional hazards assumption is violated. If interest is in estimating frailty variance, the inverse Gaussian model with proportional hazards can be considered sufficient even in the presence of both timedependent treatment effect and unobserved heterogeneity.
The inverse Gaussian shared frailty with proportional hazard probably effectively captured the unobserved heterogeneity because we did not misspecify the baseline hazard (i.e. Weibull distribution). Secondly, the longer followup period of 12 months made the population more homogeneous with time since the unobserved heterogeneity for inverse Gaussian frailty decays over the followup time [31]. Thirdly, the inverse Gaussian frailty is practically close to the lognormal frailty where the simulated data arose. The structure of the inverse Gaussian frailty would be considered more appropriate in the context of recurrent AEs in antimalarial drug trials where there are more frequent AEs early in the clinical trial [18]. Unfortunately, the conditional hazard ratio derived from the inverse Gaussian shared frailty with proportional hazard is constant over the followup time which does not reflect the nonproportionality of the hazards in the presence of timedependent treatment effect.
The interpretation of our results is confined to scenarios where the censoring is assumed to be noninformative. In the presence of informative censoring, it will be interesting to understand how the models perform. Failure to account for time to medication presents a limitation in all the models considered in this work. Another key limitation of our study is relying on the assumption of constant unobserved heterogeneity. Future work can consider evaluating the performance of the shared frailty models accounting for both actual time to medication and time to AE occurrence under complicated scenarios where there is both timevarying treatment effects and timevarying unobserved heterogeneity.
Generally, in drug clinical trials, understanding drug safety profile requires efficient modelling of the recurrent AEs including accounting for both unobserved heterogeneity and any nonlinear effects of the investigated drug. The methods studied in this paper provide key knowledge on choosing the most appropriate model for analysing recurrent AE data presenting the unobserved heterogeneity and timedependent effects issues. The flexible parametric shared methods accounting for both timedependent effects and unobserved heterogeneity are directly useful in IPTp trials since physiological changes that occur during pregnancy are mostly unobserved and can contribute to the heterogeneity of the treatment effect.
Based on our findings, we provide some practical recommendations on the studied shared frailty models to analyse the recurrent AEs when the hazards are nonproportional and the potential unobserved heterogeneity. Overall, the choice of the most appropriate model should be driven by the objective of the analysis. For instance, if interest is mainly in estimating hazard ratios, a shared frailty model with nonproportional hazards and restricted cubic splines is appropriate. When interest is in obtaining the unobserved heterogeneity estimates, inverse Gaussian with proportional hazard can serve the purpose. In the absence of timedependent effects, using a shared frailty model with nonproportional hazards and restricted cubic splines should be avoided since it leads to biased estimates due to model overfitting.
Availability of data and materials
The code for simulated data generated is available from the corresponding author upon reasonable request. The Chloroquine for Malaria in Pregnancy (NCT01443130) trial data that support the findings of this study are available from Blantyre Malaria Project, Malawi but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the corresponding author upon reasonable request and with permission of Blantyre Malaria Project, Blantyre, Malawi.
Abbreviations
 IPTp:

Intermittent preventive treatment for malaria prevention during pregnancy
 AEs:

Adverse events
 RCTs:

Randomized controlled trials
 ISFPH:

Gaussian shared frailty model with proportional hazard
 LSFNPH1:

Lognormal Shared frailty model with nonproportional hazards and single restricted cubic spline
 LSFNPH3:

Lognormal Shared frailty model with nonproportional hazards and three restricted cubic splines
 DGM:

Data generating mechanism
 TDE:

Timedependent effects
 MCSE:

MonteCarlo Standard error
 CQ:

Chloroquine
 SP:

Sulfadoxinepyrimethamine
References
Munsaka MS. A questionbased approach to the analysis of safety data. In: Biopharmaceutical Applied Statistics Symposium. Biostatistical analysis of clinical trials, Vol. 2. Singapore: Springer; 2018. p. 193–216.
Patson N, Mukaka M, Otwombe KN, Kazembe L, Mathanga DP, Mwapasa V, et al. Systematic review of statistical methods for safety data in malaria chemoprevention in pregnancy trials. Malar J. 2020;19(1):119.
Phillips R, Cornelius V. Understanding current practice, identifying barriers and exploring priorities for adverse event analysis in randomised controlled trials: an online, crosssectional survey of statisticians from academia and industry. BMJ Open. 2020;10(6):e036875.
Colopy MW, Gordon R, Ahmad F, Wang WW, Duke SP, Ball G. Statistical practices of safety monitoring: an industry survey. Ther Innov Reg Sci. 2019;53(3):293–300.
Siddiqui O. Statistical methods to analyze adverse events data of randomized clinical trials. J Biopharm Stat. 2009;19(5):889–99.
Hengelbrock J, Gillhaus J, Kloss S, Leverkus F. Safety data from randomized controlled trials: applying models for recurrent events. Pharm Stat. 2016;15(4):315–23.
Cabarrou B, GomezRoca C, Viala M, Rabeau A, Paulon R, Loirat D, et al. Modernizing adverse events analysis in oncology clinical trials using alternative approaches: rationale and design of the MOTIVATE trial. Investig New Drugs. 2020;38:1879–87.
BoxSteffensmeier JM, De Boef S. Repeated events survival models: the conditional frailty model. Stat Med. 2006;25(20):3518–33.
Hougaard P. Analysis of multivariate survival data. New York: Springer; 2012.
Duchateau L, Janssen P. The frailty model. New York: Springer; 2007.
Amorim LD, Cai J. Modelling recurrent events: a tutorial for analysis in epidemiology. Int J Epidemiol. 2015;44(1):324–33.
Balan TA, Putter H. Nonproportional hazards and unobserved heterogeneity in clustered survival data: when can we tell the difference? Stat Med. 2019;38(18):3405–20.
Crowther MJ. Extended multivariate generalised linear and nonlinear mixed effects models. 2017. https://arxiv.org/abs/1710.02223. Accessed 5 Dec 2020.
Bower H, Crowther MJ, Rutherford MJ, Andersson TML, Clements M, Liu XR, et al. Capturing simple and complex timedependent effects using flexible parametric survival models: a simulation study. Commun Stat Simul Comput. 2021;50(11):3777–93.
Rutherford MJ, Crowther MJ, Lambert PC. The use of restricted cubic splines to approximate complex hazard functions in the analysis of timetoevent data: a simulation study. J Stat Comput Simul. 2015;85(4):777–93.
Gasparini A, Clements MS, Abrams KR, Crowther MJ. Impact of model misspecification in shared frailty survival models. Stat Med. 2019;38(23):4477–502.
Cook RJ, Lawless J. The statistical analysis of recurrent events. New York: Springer; 2007.
Hanagal DD. Modeling survival data using frailty models. Singapore: Springer; 2019.
Andersen PK, Gill RD. Cox's regression model for counting processes: a large sample study. Ann Stat. 1982;10(4):1100–20.
Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. Newyork: Wiley; 2011.
Lawless JF. Statistical models and methods for lifetime data. Newyork: Wiley; 2003.
Kelly PJ, Lim LLY. Survival analysis for recurrent event data: an application to childhood infectious diseases. Stat Med. 2000;19(1):13–33.
Babiker A, Cuzick J. A simple frailty model for family studies with covariates. Stat Med. 1994;13(16):1679–92.
Pickles A, Crouchley R. A comparison of frailty models for multivariate survival data. Stat Med. 1995;14(13):1447–61.
Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074–102.
Bender R, Augustin T, Blettner M. Generating survival times to simulate cox proportional hazards models. Stat Med. 2005;24(11):1713–23.
Crowther MJ: Merlina unified modelling framework for data analysis and methods development in Stata. 2018. https://arxiv.org/abs/180601615. Accessed 30 Nov 2020.
Crowther MJ, Lambert PC. Simulating biologically plausible complex survival data. Stat Med. 2013;32(23):4118–34.
Crowther MJ, Look MP, Riley RD. Multilevel mixed effects parametric survival models using adaptive Gauss–Hermite quadrature with application to recurrent events and individual participant data metaanalysis. Stat Med. 2014;33(22):3844–58.
Divala TH, Mungwira RG, Mawindo PM, Nyirenda OM, Kanjala M, Ndaferankhande M, et al. Chloroquine as weekly chemoprophylaxis or intermittent treatment to prevent malaria in pregnancy in Malawi: a randomised controlled trial. Lancet Infect Dis. 2018;18(10):1097–107.
Hougaard P. Life table methods for heterogeneous populations: distributions describing the heterogeneity. Biometrika. 1984;71(1):75–83.
Acknowledgements
We thank all the study participants and data collectors in the clinical trial.
Funding
The clinical trial was supported by funding from NIH (grants U01AI087624 and K24AI114996 to MKL).
NP was supported by the Fogarty International Center of the National Institutes of Health [Grant number D43TW010075 to MKL and DM].
NP was also supported by South African Research Chairs Initiative (SARChI) of the Department of Science and Technology and National Research Foundation (grant 64788) awarded to Dr. Ikechukwu Achilonu (SARChI Chair in Protein Biochemistry and Structural Biology), School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg, South Africa.
Author information
Authors and Affiliations
Contributions
Study idea conceptualization: NP, TC, MM, MJCE. Methodology development: NP, MM, LK, TC, MKL, DM, MJCE. Analysis and first draft development: NP. Supervision, writing revised draftsreviewing and editing: MM, MKL, LK, TC, DM, MJCE. Enrolled patients and conducted clinical investigation: MKL. Reading and approving the final manuscript: NP, MM, MKL, LK, TC, DM, MJCE. The author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The current study has been performed in accordance with the relevant regulations and guidelines. The current study is based on computer simulations where no human subjects are used. The current study was approved by the University of the Witwatersrand’s Human Research Ethics Committee. The Chloroquine for Malaria in Pregnancy trial obtained ethical approval from the Institutional Review Board at the University of Maryland, the College of Medicine Research and Ethics Committee at the University of Malawi and the Malawi Pharmacy Medicines and Poisons Board. Informed consent was obtained from all the participants.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Patson, N., Mukaka, M., Kazembe, L. et al. Comparison of statistical methods for the analysis of recurrent adverse events in the presence of nonproportional hazards and unobserved heterogeneity: a simulation study. BMC Med Res Methodol 22, 24 (2022). https://doi.org/10.1186/s12874021014758
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874021014758