Estimation of dynamical model parameters taking into account undetectable marker values

Background Mathematical models are widely used for studying the dynamic of infectious agents such as hepatitis C virus (HCV). Most often, model parameters are estimated using standard least-square procedures for each individual. Hierarchical models have been proposed in such applications. However, another issue is the left-censoring (undetectable values) of plasma viral load due to the lack of sensitivity of assays used for quantification. A method is proposed to take into account left-censored values for estimating parameters of non linear mixed models and its impact is demonstrated through a simulation study and an actual clinical trial of anti-HCV drugs. Methods The method consists in a full likelihood approach distinguishing the contribution of observed and left-censored measurements assuming a lognormal distribution of the outcome. Parameters of analytical solution of system of differential equations taking into account left-censoring are estimated using standard software. Results A simulation study with only 14% of measurements being left-censored showed that model parameters were largely biased (from -55% to +133% according to the parameter) with the exception of the estimate of initial outcome value when left-censored viral load values are replaced by the value of the threshold. When left-censoring was taken into account, the relative bias on fixed effects was equal or less than 2%. Then, parameters were estimated using the 100 measurements of HCV RNA available (with 12% of left-censored values) during the first 4 weeks following treatment initiation in the 17 patients included in the trial. Differences between estimates according to the method used were clinically significant, particularly on the death rate of infected cells. With the crude approach the estimate was 0.13 day-1 (95% confidence interval [CI]: 0.11; 0.17) compared to 0.19 day-1 (CI: 0.14; 0.26) when taking into account left-censoring. The relative differences between estimates of individual treatment efficacy according to the method used varied from 0.001% to 37%. Conclusion We proposed a method that gives unbiased estimates if the assumed distribution is correct (e.g. lognormal) and that is easy to use with standard software.


Background
Dynamical models based on system of differential equations have been successfully used for a better understanding of the pathogenesis of infectious diseases [1,2]. Two landmark papers appeared in 1995 demonstrating the high turnover of the human immunodeficiency virus (HIV) and infected CD4+ T lymphocytes cells [3,4]. Using such dynamical models, Neumann et al. [5] gave some insight in the effect of interferon based therapy used to treat patients infected by hepatitis C virus (HCV). Moreover, the estimate of the percentage of virus production blocked by the therapy is now widely used in this field [6][7][8][9][10] to evaluate the efficacy of treatment regimens in various contexts such as patients co-infected with HIV and HCV.
Although dynamical models parameters such as virus clearance or treatment efficacy are very useful, their estimation is most often performed for individual subjects separately. The limitations of such statistical approach as well as the interest of hierarchical models have already been underlined [11,12]. The main advantage of hierarchical models (also called mixed/random effects models) is their ability to estimate all parameters at the same time, using all available data even in case of unbalanced data, i.e. the number of measurements can vary from one patient to another. Parameters can be estimated using a Bayesian approach [13][14][15] or other approaches [16]. Another advantage working with analytical solutions of the system of differential equations is that standard softwares for non linear mixed models can be used [17].
Nonetheless, a major problem arises when using viral load data. The assays used to quantify HIV or HCV RNA are limited by a detection threshold that may lead to undetectable values when the true viral load is below this threshold. From a clinical point of view, the aim of any treatment is to reduce the viral load as much as possible [18]. Therefore, the practical definition of virological response is the occurrence of sustained undetectable values. The threshold of undetectable values is changing with the improvement of the assays for quantifying the viral load. When analysing viral load as a continuous variable, the left-censored measures are most often analyzed by replacing their value by an arbitrary value (e.g. threshold or half of the threshold). Although the sensitivity of the assays is improving, this limitation still persists and has already been underlined in the context of dynamical models [12,15]. Methods to take into account left-censored repeated measures in linear mixed models [19][20][21] or in non linear mixed models [15] have already been proposed. In this paper, we show how such an approach can be implemented using standard software in the case of non linear mixed models. Furthermore, we evaluate the impact of not taking into account undetectable values when studying HCV dynamics in the context of a phase II randomised clinical trial for the treatment of HCV infection in HIV co-infected patients.

Study example
The motivating study was a phase II randomised clinical trial evaluating the efficacy of pegylated-interferon (PEG-IFN)-α2a and Ribavirin (RBV) for the treatment of HCV infection among 17 HIV co-infected patients who had already been treated for HCV [22]. HCV RNA quantification was performed at least three times within the first 4 weeks (W): W0 (treatment initiation), W2 and W4. In 8 patients, blood samples were collected more intensively with additional measures at 6 hours (H6), H12, day 1 (D1), D2, D4, W1 and W3. Patients were followed until W72 for final evaluation of the virological response but the study of viral dynamics was restricted to the first 4 weeks because of model assumptions (see below). The concentration of plasma HCV RNA was determined using a quantitative reverse transcription polymerase chain reaction (RT-PCR) assay (Cobas Amplicor HCV Monitor Test, version 2.0; Roche Molecular Systems). The lower detection limit of this assay was 600 IU/mL, i.e. 2.78 log10 IU/ mL. Of note, one international unit (IU) equals approximately 2.2 copies/mL.

Mathematical model
The model used to estimate HCV dynamics was first described by Neumann et al. [5] with the following differential equations: where T is the number of target cells (i.e. hepatocytes), I the number of productively infected cells and V the plasma HCV viral load. Target cells are produced at rate s (per day) and die at rate μ. The number of cells which become infected per day is proportional to the number of circulating virions and available target cells with a proportionality constant β (infection rate). Infected cells die at rate δ per day. HCV virions are produced at a rate p per infected cells per day and are cleared at a rate c per day. In the present model, the HCV treatment is supposed to reduce the production of virions from infected cells by a fraction (1-ε). The possible effect of IFN as well as RBV on de novo rate of infection [5] or on infectivity by producing 3 ε a fraction of non-infectious virions [23] have been discussed. For the purpose of this paper, we assumed only a combined effect of both drugs on production rate of new virions because this measure was the most widely used by other investigators [6][7][8][9].
When working on a short period of 2-4 weeks, it sounds reasonable to consider that the number of uninfected hepatocytes (T) remains constant (equal to the baseline value) because of the slow turnover of these cells [5]. Therefore, assuming a pre-treatment steady-state, the analytical solution of the equations (2) and (3) with T constant is: The viral decay is assumed to begin at t 0 = 0.25 day (6 hours), corresponding to the drugs pharmacokinetics [5].

Hierarchical formulation
The previous notations do not account for patient/measurement level. Most often parameters of such models are estimated patient by patient assuming Gaussian, homoskedastic measurement error. A more valid and powerful approach is based on a hierarchical formulation of the model [11] that can distinguish at least two levels of variation. Hence, for the j th measurement of a subject i performed at a time t ij : -Stage 1: intra-patient variation The outcome is the logarithm (base 10) of the true viral load (function of t ij and θ i , the p-vector of model parameters) plus a Gaussian measurement error e. I ni is a identity matrix of dimension n i × n i , n i being the number of measurements available for the subject i.
-Stage 2: inter-patient variation is the p-vector of average (fixed) effect in the whole study population and γ i is a q-vector (q ≤ p) of random effects for correcting θ for each subject (random effect). Actually, θ is a log-transformation of original parameters that have several advantages including a positivity constraint for original parameters. Random effects γ i were assumed to be normally distributed with a variancecovariance matrix D. θ i are estimated through Empirical Bayes estimates.

Model likelihood
As presented in more details elsewhere [24], the method proposed to take into account left-censored values when estimating parameters is to maximise a full likelihood dis- given the random effects. The calculation of this likelihood leads to the integration over u = u 1 ,u 2 ,...u q , that is a multiple integral of dimension q. Therefore, with this method, rather than imputing a fixed value of undetectable viral load, one assumes that left-censored values are completing the Gaussian distribution of Y i . A crude approach assumes that left-censored values contribute like observed values, being equal to the value of the threshold or any other given value. In this case, the likelihood is simply:

Algorithm and implementation
The maximisation of the likelihood function can be performed with standard software such as NLMIXED in SAS ® [24]. Using this procedure, the default algorithm is a quasi-newton algorithm and the calculation of the multiple integral is performed by adaptative quadrature. An example of code used for this paper is provided in appendix.

Simulation study
Simulations were performed to compare the bias on parameter estimates when taking into account left-censoring or not. Using the analytical solution (4) and allowing a random individual variation for the initial viral load and treatment efficacy, parameters to estimate were: QV 0 , ε, δ c, To constraint parameters to be in the correct range, estimations were performed on transformed parameters (for the study on real data, as well) using a logarithm function for δ, c and logit function for ε (bounding ε between 0 and 1). In the simulation study, we fixed t 0 = 0 but results were similar with t 0 = 0.25.
Values for model parameters were defined according to the results reported in the literature of HCV dynamics [6]. In our application where patients were previously treated and dually infected by HIV and HCV, the estimate of treatment efficacy is less than those usually reported in naïve patients mono-infected with HCV [5].

Simulation study
Results of the simulations are shown in Table 1. On average, 14% of simulated measures of HIV RNA were leftcensored when the treatment efficacy was ε = 80%. Crude estimates provided by standard likelihood (6) maximisation, replacing left-censored values by the value of the threshold, were dramatically biased with the exception of V 0 . In particular, with only 14% of left-censored measures, infected cells death rate δ and clearance of virus c were underestimated by 55 and 45 percent, respectively. Treatment efficacy (ε) was overestimated by 16%. Variances of random effects were also significantly biased: +20% and -19% for random effects on V 0 , and ε, respectively. The residual variance was overestimated (+133%).  When taking into account left-censoring, the relative bias on estimates was ≤ 2 % for all parameters but variance parameters. However, the biases on variance parameters significantly decreased (e.g. the bias on changed from -14% to -0.4%) when increasing the number of subjects included in the sample (e.g. N = 200).

Application
Model parameters were estimated using the HCV RNA data available during the first 4 weeks following treatment initiation in the 17 patients included in the ROCO2 trial. Among these 100 available measurements, 12 were undetectable, i.e. left-censored.
Estimates of parameters taking into account left-censoring or not are shown in Table 2. Differences between estimates according to the method used were large on the death rate infected cells, δ.
Using the crude approach, the estimate was 0.13 day -1 (95% confidence interval [CI]: 0.11; 0.17) compared to 0.19 day -1 (CI: 0.14; 0.26) when taking into account leftcensoring. These estimates correspond to half-life (t 1/2 ) of infected cells of 5.3 days (t 1/2 = ln(2)/δ) and 3.6 days, respectively. Differences between estimates for the other fixed parameters were less important. Furthermore, the confidence intervals of the estimates were larger when taking into account left-censoring ( Table 2).
The impact of the method used to estimate the parameters on individual viral load predictions is illustrated in Figure  1. For the first three patients (102, 108 and 201), the decrease of the second part of the viral load slope was more pronounced when taking into account left-censoring. Actually, left-censoring tended to occur on the last measurements depending on the treatment efficacy and the baseline level of viral load. The apparent discrepancy with observed values is obviously due to the fact that undetectable values are plotted on the detection limit (2.78 IU/mL) although the true value is below this threshold. This result is expected as the slope after the shoulder is proportional to the infected cell death rate (δ). As expected for the last patient (206) who did not have any undetectable viral load within 4 weeks, both predictions were very close.
The relative differences between estimates of individual treatment efficacy (ε i = ε + γ 1i ) according to the method used varied from 0.001% to 37%. As expected from simulation results where treatment efficacy tended to be overestimated with the crude approach and from the estimate of the average (fixed) effect ε, the estimated effect was most often higher with the crude approach compared to the corrected one. For instance, the estimate of treatment efficacy in patient 101, was 36% and 45% when taking into account left-censoring or not, respectively. On the contrary, for the patient 201, the estimates were 97% and 93%, respectively. Of note, the model was able to predict viral load changes for the patient 201 thanks to the information provided by the other patients with more numerous measurements available. This is an illustration of the advantage of hierarchical models.

Discussion
In this paper, the impact of taking into account left-censored (undetectable) HCV RNA values was illustrated on the estimation of dynamical models based on a system of differential equations. Although, the proportion of undetectable values was quite low (12%), there were clinically significant differences, particularly in estimate of mean half-life and individual treatment efficacy. Such a result is important because all these parameters are of interest. Treatment efficacy evaluation through dynamical model is broadly used in HCV infection for instance.
We observed smaller biases from the crude approach applied to the real dataset compared to simulation results. However, some parameters values were different to those used in the simulations such as δ (0.13 vs. 0.40). Simulations using values estimated with real data led to smaller biases as observed in the present application (data not shown). The overestimation of the treatment efficacy by the crude approach may appear counter-intuitive because the imputation of the value of the threshold artificially limits the decrease of viral load. However, it is difficult to anticipate the impact of left-censoring in dynamical models because of the complex relationship between parameters, particularly between ε and δ [23]. In the present study, the imputation of the value of threshold level to undetectable viral loads led to a higher level of HCV RNA than the truth, particularly in the second part of the dynamics. The death rate of infected cells (δ) is one of the main parameters influencing viral load levels in this period [5,25]. This explains the underestimation of this parameter. On the other hand, an overestimation of treatment effect on viral production (ε) is needed to obtain a Observed and predicted HCV RNA values in four patients Half-life of infected cells helps in understanding how high is the turnover [3,26,27]. Previously published results [25] can be used to illustrate the size of the impact of leftcensoring on HIV infected cells turnover. Differences in estimates of half-life of infected long-lived cells as large as those we reported in HCV would lead to halve the time needed to treat to achieve virus eradication (assuming no viral reservoir). Compared to results with piecewise linear mixed models commonly used with surveillance data (monthly to 6-months intervals between measurements) of HIV RNA [19,20], the estimates of the parameters are more sensitive to undetectable values in the context of dynamical models with highly repeated measurements. Moreover, confidence intervals of estimates were larger when taking into account left-censoring compared to simple imputation that tends to artificially decrease the variability, as previously reported with linear models [20].
The method presented in this paper is easy to implement in standard software. One limitation is that it is based on analytical solutions of the system of differential equations. However, looking at the applied papers on HCV infection, the authors used most often the same model with the same assumptions leading to the same analytical solution. Using hierarchical models taking into account left-censoring should improve the validity of estimation and may help in case of convergence difficulty when using individual data [9]. More complex mathematical models have been proposed to fit additional markers such as liver enzymes level [28] or pharmacokinetics data [29]. In this case, more general approaches based directly on numerical solution of the differential equations should be used [13,15]. Another limitation of the proposed methods is the assumption of log-normal distribution of viral load measures. In our experience, it is most often a reasonable assumption in the case of circulating HIV virus and this could be checked from residuals [20,30]. However, if this assumption is not tenable, extensions based on mixture distributions (log-normal and binary) can be used and are also easily implementable in software [21].

Conclusion
Imputing a single value to left-censored measures of viral load is a wrong assumption and stronger than assuming a given distribution for the whole measurements. We proposed a method that gives unbiased estimates if the assumed distribution is correct (e.g. lognormal) and that is easy to use with standard software.