 Research article
 Open Access
 Published:
Lasso regularization for leftcensored Gaussian outcome and highdimensional predictors
BMC Medical Research Methodology volume 18, Article number: 159 (2018)
Abstract
Background
Biological assays for the quantification of markers may suffer from a lack of sensitivity and thus from an analytical detection limit. This is the case of human immunodeficiency virus (HIV) viral load. Below this threshold the exact value is unknown and values are consequently leftcensored. Statistical methods have been proposed to deal with leftcensoring but few are adapted in the context of highdimensional data.
Methods
We propose to reverse the BuckleyJames least squares algorithm to handle leftcensored data enhanced with a Lasso regularization to accommodate highdimensional predictors. We present a Lassoregularized BuckleyJames least squares method with both nonparametric imputation using KaplanMeier and parametric imputation based on the Gaussian distribution, which is typically assumed for HIV viral load data after logarithmic transformation. Crossvalidation for parametertuning is based on an appropriate loss function that takes into account the different contributions of censored and uncensored observations. We specify how these techniques can be easily implemented using available R packages. The Lassoregularized BuckleyJames least square method was compared to simple imputation strategies to predict the response to antiretroviral therapy measured by HIV viral load according to the HIV genotypic mutations. We used a dataset composed of several clinical trials and cohorts from the Forum for Collaborative HIV Research (HIV Med. 2008;7:2740). The proposed methods were also assessed on simulated data mimicking the observed data.
Results
Approaches accounting for leftcensoring outperformed simple imputation methods in a highdimensional setting. The Gaussian BuckleyJames method with crossvalidation based on the appropriate loss function showed the lowest prediction error on simulated data and, using real data, the most valid results according to the current literature on HIV mutations.
Conclusions
The proposed approach deals with highdimensional predictors and leftcensored outcomes and has shown its interest for predicting HIV viral load according to HIV mutations.
Background
Leftcensoring due to the lower detection limit of an assay is a common problem in many fields including biology, chemistry, and the environmental sciences. One example is the quantification of the human immunodeficiency virus (HIV) viral load in plasma. The sensitivity of assays has improved and the detection threshold has decreased from 10,000 copies/mL to 20 or fewer copies/mL today. Several statistical methods have been proposed to account for leftcensoring of such quantitative variables in crosssectional (with one measure per subject) and longitudinal (with several measures per subject) studies. Standard methods include multiple imputation [1–4], reverse survival analysis methods [2, 5–7], quantile regression [8, 9] and censored quantile regression [10, 11]. Furthermore, the Tobit model with censored outcome which is supposed to be normally distributed can be estimated by maximum likelihood [12–18] or by the BuckleyJames estimator [18, 19]. Indeed, HIV viral load appears to have an underlying Gaussian distribution truncated by the detection limit that justifies the normality hypothesis [13–15, 17, 18]. As expected, approaches accounting for leftcensoring outperform simple imputation of a constant [2, 4, 13–16, 18, 20–22].
Another issue may arise when the number of predictors (p) is high compared to the number of statistical units (n), without excluding the possibility that n<p. This is known as high dimensionality. In the context of HIV infection, this can be illustrated by analyzing the association between the presence of HIV mutations and the response to antiretroviral therapy which is measured by HIV viral load. HIV strains circulating in a given individual can present mutations associated with antiretroviral treatment failure (detectable HIV viral load), also called HIV drug resistance mutations. Thus, genotypic tests allowing the detection of HIV drug resistance mutations are commonly performed in patients starting a new antiretroviral regimen or even in newly HIVinfected patients because of the transmission of resistant strains [23–26]. Lasso linear [27, 28] and logistic regressions [29], principal component and partial least square logistic regressions [30], and multiple testing correction [31] have been used to deal with more than 100 predictors and fewer than a few hundred of patients, a common situation in this context [32].
These studies use a dichotomized outcome or simple imputation by a constant to circumvent the problem of censoring. One limitation of dichotomizing a continuous outcome is the loss of information and hence power. In addition, success is usually defined as achieving an undetectable HIV viral load. However, the detection limit, although not random, depends on several factors that differ from one study to another. Thus, there is no reason, except convenience, for the detection limit to correspond to the threshold for dichotomization.
We hypothesize that approaches accounting for leftcensoring will exhibit better results compared to simple imputation strategies in a highdimensional setting similar to what has been found in lowdimensional settings.
Some works have simultaneously addressed both censoring and highdimensional problems using the Lasso [33–43], partial least squares [44], random forests [45], support vector machines [46], and deep learning [47]. These examples were developed for rightcensored survival data. A main approach to leftcensored data analysis is based on methods typically used with rightcensored survival data such as the BuckleyJames estimator. Leftcensored data are then previously reversed to rightcensored data. While from a statistical point of view, the nature of the outcome (timetoevent or quantitative measurement below a limit of detection) is secondary, this can impact the choice of adequate probability distribution functions and other practical issues.
We propose a Lassoregularized BuckleyJames least squares method with both, nonparametric imputation using KaplanMeier and parametric imputation based on the Gaussian distribution. The nonparametric Buckley–James estimator, which simply replaces censored residuals by their conditional expectations in an iterative way, has been previously applied to leftcensored HIV viral load data in a crosssectional study [18]. On the other hand, the Lasso extension of the nonparametric Buckley–James method has been proposed for rightcensored data [36, 38, 40, 48]. Our contribution consists in using the latter method for leftcensored outcomes and highdimensional predictors. Furthermore, we propose an original parametric version of the BuckleyJames method, which is adapted to the typical assumption of a Gaussian distribution of HIV viral load. We demonstrate the value of these approaches by comparing them to Lasso linear regression with simple imputation [28] for predicting the response to antiretroviral therapy by HIV genotypic mutations.
Our primary objective is to predict as accurately as possible responses in future patients who will switch to a similar regimen. Thus, comparisons are based on mean square prediction error. The prediction performances of the different methods were assessed on simulated data that reproduced the observed data. Then, methods were applied to data obtained in a collaborative study from clinical trials and cohorts provided by the Standardization in Clinical Relevance of HIV Drug Resistance Testing Project from the Forum for Collaborative HIV Research [49]. The actual data presented a moderate censoring rate of 26 %, i.e. a realistic magnitude [18, 50]. However, high (around 50 %) or even severe (around 70 %) censoring rates could be observed in older studies with a high limit of detection (LOD) or particular populations with low treatment failure rate, e.g. HIV controllers [18, 51, 52]. Thus, we also explored the impact of high and severe censoring rates on performance.
We detail how to use publicly available R packages to compute Lasso estimates with leftcensored data.
Finally, we discuss possible extensions and applications of our work.
Methods
Methods to analyze leftcensored outcome
In this section, we review the simplest models and estimation methods used to deal with leftcensoring in crosssectional studies. For a more extensive and comprehensive review of these methods, see [2, 18]. Thereafter, we consider the Lasso extension of those methods that support simple implementations.
The linear model
First, consider the general linear regression model
where X_{i} is a pvector of fixed predictors, Y_{i} is the uncensored continuous random outcome variable, β=(β_{1},⋯,β_{p})^{⊤} is a pvector of unknown regression parameters and ε_{i} are independent and identically normally distributed random variables with mean 0 and constant variance σ^{2}. Let X be the n×p matrix X=(X_{1},…,X_{n})^{⊤} and Y the n×1 vector Y=(Y_{1},…,Y_{n})^{⊤}. The intercept is omitted in the model for simplicity, and all predictor variables are assumed to be standardized (i.e. zero mean and unit variance).
Lasso on complete data
The Lasso (Least Absolute Shrinkage and Selection Operator) [53] is one of the most popular methods in highdimensional data analyses. It allows for simultaneous estimation and variable selection and has efficient algorithms available. It is considered here as the Gold Standard for our simulation studies. The Lasso estimator of parameters in model (1) is:
where \(\left \ \mathbf {Y}  \mathbf {X} \boldsymbol {\beta } \right \^{2}_{2} = {\sum \nolimits }_{i=1}^{n} \left (Y_{i}  \mathbf {X}_{i} \boldsymbol {\beta } \right)^{2}\) is the quadratic loss, \(\left \ \boldsymbol {\beta } \right \_{1} = {\sum \nolimits }_{j=1}^{p} \left  \beta _{j} \right \) is the Lasso penalty on the parameter size, and λ>0 controls the amount of regularization. When λ is large enough (which depends on data), all coefficients are forced to be exactly zero. Inversely, λ=0 corresponds to the unpenalized ordinary leastsquares estimate.
This model on complete data (no leftcensored measures) is considered as a reference when comparing the other methods applied to incomplete datasets that include leftcensored values.
The Tobit model
Because of the detection limit, Y_{i} can be leftcensored. Let LOD_{i} be the (fixed and known) censoring threshold of subject i. To simplify, we consider LOD_{i}=LOD. Z_{i} is the observed response. The socalled Tobit model [12] can be defined as:
where Y_{i} is the response variable defined in model (1). We can equivalently write:
where \(\delta _{i} = \mathbb {I}_{\left (Y_{i} > \text {LOD}\right)}\) is a censoring indicator. The idea behind the Tobit regression model is to deal with the leftcensored variable Z as the outcome of a normally distributed latent variable Y.
Simple imputation
Simple imputation is a substitution method that replaces leftcensored values with a single value, LOD. LOD/2 is another common choice. Let \(\hat {\boldsymbol {\beta }}_{LOD}\) be the ordinary least squares estimate of model:
Simple imputation is widely used for its simplicity. However, replacing any censored observation by a single value may lead to biased parameter estimates.
Beerenwinkel et al. [28] applied Lassoregularized linear regression with the naïve approach of replacing the unobserved undetectable value with the limit of detection of the assay. Then the Lasso estimator of parameters in model (5) is:
Maximum likelihood estimation
In the Tobit model (1)(3), one can assume that when Z= LOD, the density function of Z is equal to the probability of observing Y≤ LOD and for Z>LOD the density function of Z is the same as the density of Y. The likelihood function takes the form:
When the Gaussian distribution for the outcome is assumed, the loglikelihood function can be written as:
with \(f_{G}\left (u, \mathbf {X}_{i}, \boldsymbol {\beta }, \sigma ^{2}\right) = \frac {e^{\frac {\left (u  \mathbf {X}_{i} \boldsymbol {\beta }\right)^{2}}{2 \sigma ^{2}}}}{\sqrt {2 \pi \sigma ^{2}}}\) the Gaussian probability density function of Y_{i} with mean X_{i}β and constant variance σ^{2} evaluated at u and \(F_{G}\left (v, \mathbf {X}_{i}, \boldsymbol {\beta }, \sigma ^{2}\right) = \int _{ \infty }^{v}f_{G}\left (u, \mathbf {X}_{i}, \boldsymbol {\beta }, \sigma ^{2}\right) \, \mathrm {d}u \) is the corresponding Gaussian cumulative distribution function evaluated at v. Let \(\hat {\boldsymbol {\beta }}_{MLE}\) be the maximum likelihood estimation obtained by maximizing (7). Extensions to other distributions have also been explored [54]. Several works have shown the superiority of this method [18, 20–22]. However, when the parametric model is misspecified, the sample size is small or the percent censoring is high, the maximumlikelihood estimation method has been shown to perform poorly [2].
The Lasso penalty applied to some likelihood function has become an established and relatively standard technique. However, when the likelihood function is a more complex function of the model parameter, such as the likelihood function for the Tobit model (7), adding a nondifferentiable penalty leads to a computational challenging optimization.
Quantile regression and censored quantile regression
Quantile regression, particularly least absolute deviations (LAD) regression, has been applied to leftcensored data [9]:
Median regression is a natural alternative to the usual mean regression in the presence of heteroscedasticity or when the normality assumption is violated. Simple imputation using robust regression may be less sensitive to the influence of censored observations.
Lassoregularized least absolute deviations regression has been investigated in the literature (e.g. [55]).
Powell [10] proposed the LAD estimate specifically for censored data:
Later, the approach was extended to more general quantiles [11].
The Lasso extension of censored quantile regression (basically, censored LAD) has been analyzed for rightcensored survival data [41, 42, 56] and specifically for leftcensored data [57–62]. Yu et al. [58] and Alhamzawi et al. [61] proposed Bayesian approaches using different hyperparameters priors. These methods rely on computationally intensive algorithms. In practice, applications are limited to the n>>p case. Others [57, 59, 60, 62] derived theoretical properties of the Lassoregularized censored least absolute deviations regression, but the algorithmic development was not a priority in these works and the practical use was limited to n>>p or not addressed. To our knowledge, there are no publicly available software tools that implement the Lasso extension of Powell’s approach and no simple implementation relying on existing packages seems straightforward.
Nonparametric BuckleyJames
Leftcensored outcome data can be analyzed using methods designed for rightcensored survival data by reversing the outcome scale. For instance, Gillespie et al. [6] proposed the reverse KaplanMeier and Dinse et al. [7] reversed the Cox method (though in the case of leftcensored exposures and uncensored outcome). After the Cox model, the accelerated failure time model is the most frequently used regression model for rightcensored survival data. It directly links the expected response to predictors, analogously to the classical linear regression approach. A popular method for fitting the accelerated failure time model is the BuckleyJames estimator [19], an extension of the least squares principle. The idea is to impute the censored values by their estimated conditional mean to provide censoring and predictor values:
with
where f(u,X_{i},β) is the (unknown) probability density function of Y_{i} with mean X_{i}β evaluated at u and F(u,X_{i},β) is the corresponding cumulative distribution function. By "flipping" the data (turning it from leftcensored to rightcensored), the application of algorithms previously developed for rightcensoring is direct and has been performed in other contexts [5]. We consider M an arbitrary constant that equals or exceeds the largest observation. Then subtract all uncensored and leftcensored outcomes from M. The leftcensored at LOD variable Z is then replaced by M−Z which is rightcensored at M−LOD. Let (M−Z_{i})^{∗} be imputed as \(\delta _{i} \left (M{Y}_{i}\right) + \left (1  \delta _{i}\right)\, \mathbb {E} \,\left (MY_{i} \left  M{Y}_{i}\geq M\text {LOD}, \mathbf {X}_{i}\right.\right)\). Then, we can calculate the conditional expectation by
where F(u,X_{i},β) is now the (unknown) cumulative distribution function of M−Y_{i} with mean M−X_{i}β evaluated at u, which can be estimated, for example, by KaplanMeier. The BuckleyJames estimate, \(\hat {\boldsymbol {\beta }}_{NonParBJ}\), can be computed using a semiparametric iterative algorithm that alternates between imputation of censored values according to (9) and leastsquares estimation.
The main drawback of this method is that convergence of the algorithm is not guaranteed. Due to the discontinuous nature of the estimating function (formulation (8) makes \(\hat {\boldsymbol {\beta }}_{NonParBJ}\) to be a piecewise linear function in β), the iterative procedure may oscillate between different parameter values. The problem is of practical importance in situations where the effect of predictors is small or in small samples [63] (which could be worse in highdimensional settings). To circumvent this problem, a onestep algorithm that stops at the first iteration is used in some works [36, 64]. This approach is close to a substitution method in which values below the detection limit are replaced by expected values of the missing measurements, provided they are less than the detection limit [65].
Several authors have proposed combining the iterative BuckleyJames imputation and methods handling highdimensional predictors: Johnson et al. [36, 48] and Cai et al. [38] used the Lasso, Wang et al. [37], used boosting, Wang et al. [40] used ElasticNet, Johnson et al. [66] and Li et al. [67] used the Dantzig selector, and Dirienzo et al. [68] used parsimonious covariate selection. The BuckleyJames estimate can be computed using an iterative algorithm that alternates between imputation of censored values according to (9) and the Lasso:
Gaussian BuckleyJames
Alternatively, the BuckleyJames imputation (8), assuming the logarithm of HIV viral load follows a Gaussian distribution, can be calculated with the conditional expectation:
where f_{G} and F_{G} are the Gaussian density and cumulative distribution functions defined in (7). Again, the solution can be computed by iteratively alternating between imputation based on (11) and parameter estimation using ordinary least squares, \(\hat {\boldsymbol {\beta }}_{GaussBJ}\), or the Lasso:
Graphical illustration in a lowdimensional setting
To illustrate the difference between estimation methods we generated data from the simple linear model (p=1): Y_{i}=X_{i}β+ε_{i} with i=1,⋯,n, X∼N(0,1), ε∼N(0,σ^{2}). β was set to 10 and σ^{2} was chosen such that the signaltonoise ratio was 4 :3. A limit of detection was then fixed to obtain the desired censoring rate: moderate, 20 %, high, 50 % or severe, 70 %.
In Fig. 1, predicted regression lines are obtained using different methods: the true model that generated the data, the gold standard (ordinary least squares with uncensored data), maximum likelihood estimation (MLE), which is identical to the Gaussian BuckleyJames estimation (BJ) when p=1, nonparametric BuckleyJames (BJ), least absolute deviations (LAD) and censored LAD regressions, simple imputation by the limit of detection (LOD) and by LOD /2.
Notice that simple imputation by LOD and LOD/2 are the most distant regression lines from the true and gold standard lines, in an opposite way: simple imputation by LOD tends to overestimate the response values while simple imputation by LOD/2 tends to underestimate them.
Maximum likelihood estimation shows one of the best behaviors, but the computational complexity dramatically increases with p (results not showed). Mean and median regressions with simple imputation by LOD are quite close and are the closest when the estimation situation is easy (high n, low censoring rate). The censored LAD shows better results than censored mean regression (MLE and BuckleyJames) for small sample size while the inverse is observed when n=100. Gaussian BuckleyJames and MLE are identical, but their differences increase when p increases (results not showed). In this i.i.d. generated from a Gaussian distribution example, the Gaussian BuckleyJames estimate shows better behavior than nonparametric BuckleyJames, the difference being higher when n is small.
Tuning parameter selection
Kfold crossvalidation is routinely applied to select the optimal regularization parameter when the main goal of the study is prediction. Data D is randomly chunked into K disjoint blocks of approximately equal size. To avoid a potentially unbalanced partition, we consider stratified Kfold crossvalidation, i.e. each fold contains roughly the same proportion of censoring as in the whole sample. \(\mathbf {D}_{\smallsetminus k}\) is the learning data, used to estimate coefficients. D_{k} is the test data, not used in the estimation process and then used to evaluate the loss function L. This Kfold crossvalidation can be written as:
CV is evaluated on a grid of λ–values. The highest value, λ_{max}, corresponds to the smallest value of λ for which all coefficients are zero. The lowest value, λ_{min}, corresponds to the unpenalized solution (when feasible). We choose the λ value that minimizes the CV function.
Squared error loss is one of the most widely used loss functions:
where n_{k} is the sample size of D_{k}. However, Y is a latent variable not fully observed due to the detection limit. This loss function could be used only for the gold standard in (2), with simulated data. Again, the simplest imputation strategy consists in replacing Y with Z, in D_{k}. Alternatively, BuckleyJames strategies could replace censored Y_{i} values in the test data D_{k} by their conditional expectation estimated using the learning data [48].
On the other hand, a loss function differentiating the contribution of uncensored and censored data would be useful. Assuming the Gaussian distribution of the HIV viral load (7), the following loss function could be derived:
where \(n_{k}^{\text {unc}}\) is the number of uncensored observations in D_{k}. The loss function L_{G} in (15) is proportional and equivalent to the negative Gaussian loglikelihood loss function, but allows for comparison with the squared loss in (14).
Implementation issues
All statistical analyses, comparisons and implementations were performed using the computing environment R (R Development Core Team, 2017) [69]. We used the function cv.glmnet from package glmnet [70] to choose the optimal λ value of Lasso linear regression on complete data (GoldS) and Lasso linear regression with simple substitution of leftcensored values by the detection limit (LOD). We implemented the Lasso nonparametric BuckleyJames (NonParBJ) using the bujar package [71]. We modified the function to support stratified Kfold crossvalidation and conserve the same proportion of censoring in all the folds. Lasso Gaussian BuckleyJames (Gaussian BJ) was implemented in a new function cvGaussBJ. Algorithm 1 specifies how to solve the problem. The stopping criterion is based on the difference between current and previous regression coefficient estimates, variance estimates, and imputed data. Because of the tendency to oscillate between different parameter values of the iterative procedure, the algorithm is also stopped if the number of oscillations is high [40]. Alternatively, we also considered the onestep algorithm, which stops at the first iteration [36, 64]. Crossvalidation based on both imputation and loss function accounting for censored and uncensored contributions is considered. The Lasso estimation step depends on package glmnet.
All these implementations and an artificial example are available at: https://github.com/psBiostat/leftcensoredLasso.
Prediction of HIV viral load from HIV genotypic mutations: real and simulated data
HIV is highly replicative and thus presents high mutation and recombination rates which could lead to the development of HIV drug resistance and consequently reduce the efficacy of antiretroviral treatment. To optimize the control of the evolution of HIV drug resistance, HIV viral load is routinely monitored to identify treatment failure, and HIV genotypic tests are commonly performed before a switch to a new treatment regimen in patients already treated or at the initiation of the first treatment in naive HIVinfected patients [72].
Our objective is to compare methods that handle leftcensoring by conditional imputation with methods that handle leftcensoring by imputing a single constant value (that is, the Lassoregularized linear regression with simple imputation by LOD and LOD /2) to predict of HIV viral load by HIV genotypic mutations. The methods accounting for leftcensoring by imputing the estimated conditional mean given censoring and predictor values are the Lassoregularized BuckleyJames least square algorithms (with/without Gaussian assumption, with complete convergence/1step, using crossvalidation based on imputation/loss function accounting for censored and uncensored contributions).
Real data
The database used in this study was provided by the Standardization and Clinical Relevance of HIV Drug Resistance Testing Project for the Forum for Collaborative HIV Research [49]. Patients included in this study were all treatmentexperienced and switched to an abacavircontaining regimen. The investigated drug, abacavir, is a nucleoside reverse transcriptase inhibitor (NRTI) that blocks HIV reverse transcriptase.
The sample size n=99 was slightly smaller than the number of predictors p=121. 54 of the 121 predictors correspond to the presence or absence of specific mutations in the reverse transcriptase gene (RTG), which were reported to be probably associated with resistance to abacavir, multiNRTI, NRTI (other than abacavir) or nonnucleoside reverse transcriptase inhibitors (NNRTI) at the time of the study [73, 74]. The number of mutations reported to be probably associated with resistance to abacavir or multiNRTI is low (14%). The other 67 predictors correspond to the presence or absence of specific mutations in the protease gene (PG) reported to be probably associated with resistance to one or several protease inhibitors (PI) at the time of the study [73, 74]. The number of molecules, including abacavir, ranged from 1 to 6 (with the median number of molecules being 3 and interquartile range 2). In particular, a PI was prescribed in 59% of the patients and 43% received an NNRTI. The response variable is the logHIV viral load measured at t_{8} (8 weeks after treatment initiation at t_{0}). LOD was fixed at 100 copies/mL and the censoring rate was moderate (26%).
Generation of simulated data
HIV viral load appears to have an underlying Gaussian distribution when logtransformed. Therefore, our outcome, \(Y^{(8)}_{i}\) is generated from a Gaussian distribution. We simulated 200 data sets of size n=100 and p=100 predictors from the model:
where

\(Y_{i}^{(0)}\), the HIV viral load at t_{0} is generated by a normal distribution with mean 12 (log10 copies/mL) and variance 1.

\(\beta _{1}^{(0)}\) represents the change of the slope between the HIV viral load on the day of treatment, t_{0}, and 8 weeks later, at t_{8}, when no mutations are present and for 1 log_{10}/mL higher concentration of viral load at t_{0}.
We fix β_{0} and \(\beta _{1}^{(0)}\) to obtain the desired censoring rates: 20% (moderate), 50% (high), and 70% (severe).

X_{(n×p)}, representing the presence or absence of HIV mutations, is generated by a multinomial distribution with mean 0.15 (the fixed prevalence for all the 100 mutations) and covariance matrix Σ where Σ_{ij}=0.4^{i−j} (the closer the mutations, the more positively they are correlated).

β=(β_{1},⋯,β_{p})^{⊤}. Among p=100 candidate mutations only 10% are relevant with effects β_{j}=1, if j=1,…,10 and 0 if j>10. A 1unit increase in HIV viral load is expected per occurrence of these relevant mutations for a given baseline HIV viral load.

ε is generated from a normal distribution with mean 0 and variance σ^{2} chosen such that the signaltonoise ratio is fixed at 3:1.
Our primary goal was to compare competing methods in terms of prediction accuracy. Consequently, we simulated training and test datasets. The former were used to estimate, the latter were used to evaluate the prediction performance. We ensured that training and test datasets contained roughly the same proportions of censoring. We computed the mean squared error on test data as:
with \(\hat {\boldsymbol {\beta }}(\hat \lambda)\) estimated on training data by stratified 5fold crossvalidation using a given regression method (Gold standard, Lassoregularized Nonparametric and Gaussian BuckleyJames with complete convergence and 1step and Lassoregularized linear regression with simple imputation by LOD and LOD /2) and the corresponding loss function in the crossvalidation criterion. The gold standard uses (14), the others replace the censoring values according to their imputation strategy, and the Gaussian BuckleyJames also uses the loss function in (15).
Results
Simulation results
Figure 2 shows the mean prediction error results. "OLS on True Model" corresponds to ordinary least squares for the linear model with uncensored data and only relevant predictors (the socalled oracle estimator). Gold standard (GoldS) corresponds to the Lasso estimation for the linear model with uncensored data. These results allow for reference prediction errors when the true model is known (the first one) or due to censoring data (both).
The imputation by LOD /2 led to poorer results than the imputation by LOD. Thus, for the simple imputation, only LOD imputation (LOD) results are shown. For the Gaussian BuckleyJames algorithms (Gaussian BJ), the error is calculated by using both crossvalidation with imputation and crossvalidation with the loss function indicated in (15), but only the best results are shown.
The Gaussian BuckleyJames method presented an oscillating behavior in 9.5% of the generated samples when the convergence rate was 20%. This percentage rose to 82.5% and 95.0% when the convergence rates were 50% and 70%, respectively. For the Gaussian BuckleyJames using the 1step algorithm (Gaussian BJ 1Step), results using the two crossvalidation approaches were almost identical. Nevertheless, for the Gaussian BuckleyJames with complete convergence, a notable improvement was obtained when applying (15).
The higher the rate of censoring, the less information is available to train the models and, unsurprisingly, the higher is the prediction error. For a moderate rate of censoring (20%), all methods show a good performance close to that of the gold standard GoldS. When the rate of censoring is 50%, Gaussian BJ shows the lowest prediction error, followed by Gaussian BJ 1step, NonParBJ and finally simple imputation, which shows more errors. The same patterns but more pronounced were observed with a severe rate of censoring (70%). Taking the knowledge about the distribution into account appears to have only a slight impact. Simple imputation yields the poorest results. In addition, it showed high variability with some extreme errors.
Application to real data
The Lassoregularized BuckleyJames least square algorithm that showed the best behavior in the simulation study (with complete convergence and crossvalidation based on the loss function L_{G} in (15)) was applied to real data, as well as the Lassoregularized nonparametric BuckleyJames method and simple imputation (by LOD and LOD /2).
Regularization parameters were estimated by stratified crossvalidation in order to ensure that each fold had the same proportion of censoring as in the corresponding data set (26%). In addition, because some mutations were relatively infrequent, we used 20fold crossvalidation. Indeed, the higher the number of folds, the lower the probability of randomly obtaining test sets with no subject exposed to infrequent HIV drug resistance mutations.
Figure 3 shows two examples of the observed HIV viral load at t_{0} and the observed and estimated HIV viral load at t_{8}. As in the lowdimensional case shown in Fig. 1, simple imputation by LOD predicted the highest values of HIV viral load. Inversely, simple imputation by LOD /2 estimates the lowest values of HIV viral load. The difference between the two estimates at 8 weeks is >0.5 log_{10} copies/mL, which is clinically relevant. Lassoregularized BuckleyJames least square algorithms (with/without Gaussian assumption), often gave a prediction in between. This tendency was increased when the censoring rate was high or when the sensitivity of the assay was low.
When applying the Gaussian BuckleyJames method to real data, no oscillating behavior was observed.
Table 1 indicates the number of HIV genotypic mutations selected from the list of mutations that may contribute to a reduced virologic response known at the time of the study [73, 74], according to each method applied. The Lassoregularized Gaussian BuckleyJames selected several HIV genotypic mutations suspected of being associated with abacavir or multiNRTI resistance. Furthermore, it selected a high number of HIV genotypic mutations probably associated with PI resistance and a few probably associated with NNRTI resistance. This selection of a large number of candidate predictors seems to be relevant because all patients received an abacavircontaining regimen and a high percentage of patients received regimens including a PI and/or NNRTI. The Lassoregularized nonparametric BuckleyJames selected fewer mutations, and especially fewer mutations in PG, probably due to PI resistance. Simple imputation of the LOD or LOD /2 selected few mutations. In particular, only 1 of the 5 mutations in RTG probably associated with abacavir resistance was retained.
Discussion
Simple imputation of the detection limit or of half of this limit is an ad hoc approach to address leftcensored outcome data. However, in standard (lowdimensional) settings, it leads to biased estimates of parameters and standard errors. In our highdimensional simulation study, simple imputation using Lassoregularized leastsquares showed poor performance. As in lowdimensional settings, approaches accounting for leftcensoring outperformed simple imputation.
In this work, we propose a Lassoregularized Gaussian BuckleyJames algorithm, according to the usual Gaussian assumption of logtransformed HIV viral load. Because of the wellknown convergence problems of the iterative BuckleyJames procedure, we implemented two algorithms, the first algorithm running until convergence and the second one being stopped after one step [36, 64]. This onestep algorithm showed similar results in the simulation study. Other solutions have been proposed to deal with convergence problems in lowdimensional settings [39, 64] and could be investigated in future research.
As in other works [48], we implemented a crossvalidation criterion for the tuning parameter based on imputing values to Y_{i} in the test set from conditional expectations estimated using the learning set. We also proposed a crossvalidation criterion based on a loss function that accounts for the different contribution of censored and uncensored values. Almost identical results were obtained when applying the two crossvalidation criteria to the onestep algorithm. However, when running the algorithm until convergence, better results were obtained with the crossvalidation criterion based on a loss function that accounts for censored and uncensored contributions.
On the other hand, we reversed the Lassoregularized nonparametric BuckleyJames method previously applied to rightcensored survival data [36, 38, 40, 48] in order to apply to leftcensoring due to detection limits. Foreseeably, in our homoscedastic Gaussian outcome data scenario, the Gaussian BuckleyJames showed better behavior than the nonparametric algorithm. However, accounting for the knowledge about the distribution seems to have had a slight influence. When the Gaussian assumption is violated, nonparametric imputation using Kaplan Meier is perhaps the best option.
We provide a publicly available R code to compute the methods introduced in this work (https://github.com/psBiostat/leftcensoredLasso). It would be interesting to compare the Lassoregularized BuckleyJames least squares method to Lassoregularized censored LAD method. The Lasso extension of censored LAD has been proposed in different works [41, 42, 56–62]. However, to our knowledge, there is no publicly available implementation, and no simple implementation relying on existing packages seem straightforward. Moreover, several works have shown the superiority of maximum likelihood estimation in lowdimensional settings when the Gaussian assumption is valid [18, 20–22]. Nevertheless, optimization strategies for complex likelihood functions (such as that in Eq. 7) including penalties that are not smooth are not obvious.
To illustrate the application of the methods on real data, we consider a data set from the Standardization and Clinical Relevance of HIV Drug Resistance Testing Project for the Forum for Collaborative HIV Research. The data set used to illustrate the initial data set is characterized by a sample sizetopredictors ratio of around 1. There is no gold standard to measure and compare predictive performance of the different methods when using censored outcome data. All patients were being treated with abacavir, an NRTI, so we expected our methods to select a high number of HIV genotypic mutations known to contribute to abacavir and NRTI resistance. Furthermore, a high number of patients were on PI and/or NNRTIcontaining regimens, and a selection of several HIV genotypic mutations reported to be probably associated with resistance to any of these molecules was also expected [73, 74]. In that sense, the Gaussian and nonparametric BuckleyJames methods showed more coherent results with the literature compared to simple imputation.
Otherwise, the data presented a moderate censoring rate of 26%, which is a realistic magnitude [18, 50] in studies measuring HIV viral load. However, high or even severe censoring rates were found in older studies with a high limit of detection (LOD) or particular populations with a low treatment failure rate [18, 51, 52]. Furthermore, leftcensoring due to the lower detection limit of an assay is a problem in many fields such as biology, immunology, chemistry, and the environmental sciences in which high censoring rates may be frequent. Our simulation study shows that the difference in performance between Lassoregularized BuckleyJames methods and Lassoregularized simple imputation methods increased with the censoring rate.
In our simulations and real application, the detection threshold was the same for all subjects. The detection threshold may vary among subjects, for example, in multicentric studies. Our R code also supports multiple lower limits of quantification. However, the findings should be interpreted with caution: differences in technological equipment could be a confounding factor that might help explain the differences in patient response to HIV treatment (in addition to HIV mutations). Adjusting or stratifying for the hospital would then be necessary.
In this study we focused on the prediction performance of Lassoregularized methods. In clinical applications, even when prediction accuracy is the main objective, researchers aim to identify which predictors are more strongly associated with outcome. Our proposal could be easily extended or adapted to support other Lassotype penalties. When the primary goal is to infer the set of truly relevant variables, the adaptive Lasso and the bootstrapenhanced Lasso could thus be considered.
References
 1
Paxton W, Coombs R, McElrath M, Keefer M, Hughes J, Sinangil F, Chernoff D, Demeter L, B BW, Corey L. Longitudinal analysis of quantitative virologic measures in human immunodeficiency virusinfected subjects with > or = 400 CD4 lymphocytes: implications for applying measurements to individual patients. National Institute of Allergy and Infectious Diseases AIDS Vaccine Evaluation Group. J Infect Dis. 1997; 175(2):247–54.
 2
Helsel DR. More than obvious: Better methods for interpreting nondetect data. Environ Sci Technol. 2005; 39(20):419–23.
 3
Lee M, Kong L, Weissfeld L. Multiple imputation for leftcensored biomarker data based on Gibbs sampling method. Stat Med. 2012; 31:1838–48.
 4
Del Greco M F, Pattaro C, Minelli C, Thompson JR. Bayesian analysis of censored response data in familybased genetic association studies. Biom J. 2016; 58(5):1039–53.
 5
Marschner I, Betensky R, DeGruttola V, Hammer S, Kuritzkes D. Clinical trials using HIV1 RNAbased primary endpoints: Statistical analysis and potential biase. J Acquir Immune Defic Syndr Hum Retrovirol. 1999; 20(3):220–7.
 6
Gillespie BW, Chen Q, Reichert H, Franzblau A, Hedgeman E, Lepkowski J, Adriaens P, Demond A, Luksemburg W, Garabrant DH. Estimating population distributions when some data are below a limit of detection by using a reverse KaplanMeier estimator. Epidemiology. 2010; 21:64–70.
 7
Dinse G, Jusko A, Ho L, Annam K, Graubard B, HertzPicciotto I, Miller F, Gillespie B, Weinberg C. Accomodating measurements below a limit of detection: A novel application of Cox regression. Am J Epidemiol. 2014; 179(8):1018–24.
 8
Wang HJ, Zhu Z, Zhou J. Quantile regression in partially linear varying coefficient models. Ann Stat. 2009; 37(6B):3841–66.
 9
Eilers PH, Röder E, Savelkoul HF, van Wijk RG. Quantile regression for the statistical analysis of immunological data with many nondetects. BMC Immunol. 2012; 13:13–37.
 10
Powell JL. Least absolute deviations estimation for the censored regression model. J Econ. 1984; 25:303–25.
 11
Powell JL. Censored regression quantiles. J Econom. 1986; 32:143–55.
 12
Tobin J. Estimation of relationships for limited dependent variables. Econometrica. 1958; 26:24–36.
 13
Hughes JP. Mixed effects models with censored data with application to HIV RNA levels. Biometrics. 1999; 55:625–9.
 14
JacqminGadda H, Thiébaut R, Chêne G, Commenges D. Analysis of leftcensored longitudinal data with application to viral load in HIV infection. Biostatistics. 2000; 1(4):355–68.
 15
Lynn HS. Maximum likelihood inference for leftcensored HIV RNA data. Stat Med. 2001; 20:33–45.
 16
Nie L, Chu H, Liu C, Cole SR, Vexler A, Schisterman EF. Linear regression with an independent variable subject to a detection limit. Epidemiology. 2010; 21:17–24.
 17
Fu P, Hughes J, Zeng G, Hanook S, Orem J, Mwanda O, Remick S. A comparative investigation of methods for longitudinal data with limits of detection through a case study. Stat Methods Med Res. 2016; 25(1):153–66.
 18
Wiegand RE, Rose CE, Karon JM. Comparison of models for analyzing twogroup, crosssectional data with a gaussian outcome subject to a detection limit. Stat Methods Med Res. 2016; 25(6):2733–49.
 19
Buckley J, James I. Linear regression with censored data. Biometrika. 1979; 66:429–36.
 20
Hewett P, Ganser GH. A comparison of several methods for analyzing censored data. Ann Occup Hyg. 2007; 51:611–32.
 21
Uh HW, Hartgers FC, Yazdanbakhsh M, HouwingDuistermaat JJ. Evaluation of regression methods when immunological measurements are constrained by detection limits. BMC Immunol. 2008; 9(1):59.
 22
Kafatos G, Andrews N, McConway KJ, Farrington P. Regression models for censored serological data. J Med Microbiol. 2013; 62(Pt 1):93–100.
 23
Hirsch MS, Günthard HF, Schapiro JM, Vézinet FB, Clotet B, Hammer SM, Johnson VA, Kuritzkes DR, Mellors JW, Pillay D, et al. Antiretroviral drug resistance testing in adult HIV1 infection: 2008 recommendations of an International AIDS SocietyUSA panel. Clin Infect Dis. 2008; 47(2):266–85.
 24
Wittkop L, Günthard H, de Wolf F, Dunn D, CozziLepri A, de Luca A, Kücherer C, Obel N, von Wyl V, Masquelier B, Stephan C, Torti C, Antinori A, Garcia F, Judd A, Porter K, Thiébaut R, Castro H, van Sighem A, Colin C, Kjaer J, Lundgren J, Paredes R, Pozniak A, Clotet B, philipps A, Pillay D, Chêne G, study group EC. Effects of transmitted drug resistance on virological and immunological response to initial combination antiretroviral therapy for HIV (eurocoordchain joint project): a european multicohort study. Lancet Infect Dis. 2011; 11(5):363–71.
 25
Hofstra LM, Sauvageot N, Albert J, Alexiev I, Garcia F, Struck D, Van de Vijver DA, Åsjö B, Beshkov D, Coughlan S, et al. Transmission of HIV drug resistance and the predicted effect on current firstline regimens in europe. Clin Infect Dis. 2016; 62(5):655–63.
 26
Wensing AM, Calvez V, Günthard HF, Johnson VA, Paredes R, Pillay D, Shafer RW, Richman DD. 2017 update of the drug resistance mutations in HIV1. Top Antivir Med. 2017; 24(4):132.
 27
Rabinowitz M, Myers L, Banjevic M, Chan A, SweetkindSinger J, Haberer J, McCann K, Wolkowicz R. Accurate prediction of HIV1 drug response from the reverse transcriptase and protease amino acid sequences using sparse models created by convex optimization. Bioinformatics. 2006; 22(5):541–9.
 28
Beerenwinkel N, Montazeri H, Schuhmacher H, Knupfer P, von Wyl V, Furrer H, Battegay M, Hirschel B, Cavassini M, Vernazza P, Bernasconi E, Yerly S, Böni J, Klimkait T, Cellerai C, Günthard HF, Study TSHC. The individualized genetic barrier predicts treatment response in a large cohort of HIV1 infected patients. PLoS Comput Biol. 2013; 9(8):1–11.
 29
CozziLepri A, Prosperi MCF, Kjær J, Dunn D, Paredes R, Sabin CA, Lundgren JD, Phillips AN, Pillay D, for the EuroSIDA, the United Kingdom CHIC/United Kingdom HDRD Studies. Can linear regression modeling help clinicians in the interpretation of genotypic resistance data? an application to derive a lopinavirscore. PLoS ONE. 2011; 6(11):1–9.
 30
Wittkop L, Commenges D, Pellegrin I, Breilh D, Neau D, Lacoste D, Pellegrin JL, Chêne G, Dabis F, Thiébaut R. Alternative methods to analyse the impact of HIV mutations on virological response to antiviral therapy. BMC Med Res Methodol. 2008; 8(1):68.
 31
Assoumou L, Houssaïna A, Corstagliola D, Flandre P, Standardization and clinical relevance of HIV drug resistance testing project from the forum for collaborative HIV research. Relative contributions of baseline patient characteristics and the choice of statistical methods to the variability of genotypic resistance scores: the example of didanosine. J Antimicrop Chemother. 2010; 65(4):752–60.
 32
Rhee S, Taylor J, Wadhera G, BenHur A, Brutlag D, Shafer R. Genotypic predictos of human immunodeficiency cirus type 1 drug resistance. Proc Natl Acad Sci USA. 2006; 103(46):17355–60.
 33
Tibshirani R. The Lasso method for variable selection in the Cox model. Stat Med. 1997; 16:385–95.
 34
Huang J, Ma S, Xie H. Regularized estimation in the accelerated failure time model with highdimensional covariates. Biometrics. 2006; 62:813–20.
 35
Datta S, LeRademacher J, Datta S. Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and Lasso. Biometrics. 2007; 63:259–71.
 36
Johnson BA. Variable selection in semiparametric linear regression with censored data. J R Stat Soc Ser B Stat Methodol. 2008; 70:351–70.
 37
Wang S, Nan B, Zhu J, Beer DG. Doubly penalized BuckleyJames method for survival data with highdimensional covariates. Biometrics. 2008; 64(1):132–40.
 38
Cai T, Huang J, Tian L. Regularized estimation for the accelerated failure time model. Biometrics. 2009; 65:394–404.
 39
Ueki M. A note on automatic variable selection using smooththreshold estimating equations. Biometrika. 2009; 96(4):1005–11.
 40
Wang Z, Wang CY. Buckleyjames boosting for survival analysis with highdimensional biomarker data. Stat Appl Genet Mol Biol. 2010; 9(1):24.
 41
Shows JH, Lu W, Zhang HH. Sparse estimation and inference for censored median regression. J Stat Plan Infer. 2010; 140:1903–17.
 42
Wang HJ, Zhou J, Li Y. Variable selection for censored quantile regression. Stat Sin. 2013; 23(1):145–67.
 43
Chung M, Long Q, Johnson BA. A tutorial on rankbased coefficient estimation for censored data in smalland largescale problems. Stat Comput. 2013; 23(5):601–14.
 44
Huang X, Pan W, Park S, Han X, Miller LW, Hall J. Modeling the relationship between LVAD support time and gene expression changes in the human heart by penalized partial least squares. Bioinformatics. 2004; 20(6):888–94.
 45
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat. 2008; 2:841–60.
 46
Wang Y, Chen T, Zeng D. Support vector hazards machine: A counting process framework for learning risk scores for censored outcomes. J Mach Learn Res. 2016; 17(167):1–37.
 47
Van der Burgh HK, Schmidt R, Westeneng HJ, de Reus MA, van den Berg LH, van den Heuvel MP. Deep learning predictions of survival based on MRI in amyotrophic lateral sclerosis. NeuroImage Clin. 2017; 13:361–9.
 48
Johnson BA. On lasso for censored data. Electron J Stat. 2009; 3:485–506.
 49
CozziLepri A. Initiatives for developing and comparing genotype interpretation systems: external validation of existing rulebased interpretation systems for abacavir against virological response. HIV Med. 2008; 9(1):27–40.
 50
Marks G, Gardner LI, Craw J, Giordano TP, Mugavero MJ, Keruly JC, Wilson TE, Metsch LR, Drainoni ML, Malitz F. The spectrum of engagement in HIV care: do more than 19% of HIVinfected persons in the US have undetectable viral load?. Clin Infect Dis. 2011; 53(11):1168–9.
 51
Dao CN, Patel P, Overton ET, Rhame F, Pals SL, Johnson C, Bush T, Brooks JT, Study to Understand the Natural History of HIV and AIDS in the Era of Effective Therapy (SUN) Investigators. Low vitamin D among HIVinfected adults: prevalence of and risk factors for low vitamin D levels in a cohort of HIVinfected adults and comparison to prevalence among adults in the US general population. Clin Infect Dis. 2011; 52(3):396–405.
 52
Leon A, Perez I, RuizMateos E, Benito JM, Leal M, LopezGalindez C, Rallon N, Alcami J, LopezAldeguer J, Viciana P, et al. Rate and predictors of progression in elite and viremic HIV1 controllers. AIDS. 2016; 30(8):1209–20.
 53
Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol. 1996; 58(1):267–88.
 54
Sigrist F, Stahel WA. Using the censored gamma distribution for modeling fractional response variables with an application to loss given default. ASTIN Bull J Int Actuar Assoc. 2011; 41(02):673–710.
 55
Belloni A, Chernozhukov V. L1penalized quantile regression in highdimensional sparse models. Ann Stat. 2011; 39(1):82–130.
 56
Xue X, Xie X, Strickler HD. A censored quantile regression approach for the analysis of time to event data. Stat Methods Med Res. 2018; 27(3):955–65.
 57
Zhanfeng W, Yaohua W, Lincheng Z. A lassotype approach to variable selection and estimation for censored regression model. Chin J Appl Probab Stat. 2010; 26(1):66–80.
 58
Yue YR, Hong HG. Bayesian tobit quantile regression model for medical expenditure panel survey data. Stat Model. 2012; 12(4):323–46.
 59
Liu X, Wang Z, Wu Y. Group variable selection and estimation in the tobit censored response model. Comput Stat Data Anal. 2013; 60:80–9.
 60
Zhou X, Liu G. LADlasso variable selection for doubly censored median regression models. Commun Stat Theory Methods. 2013; 45(12):3658–67.
 61
Alhamzawi R. Bayesian elastic net tobit quantile regression. Commun Stat Simul Comput. 2016; 45(7):2409–27.
 62
Müller P, van de Geer S. Censored linear model in high dimensions. TEST. 2015; 25(1):75–92.
 63
Peter Wu CS, Zubovic Y. A largescale monte carlo study of the BuckleyJames estimator with censored data. J Stat Comput Simul. 1995; 51(24):97–119.
 64
Wang YG, Zhao Y, Fu L. The Buckley–James estimator and induced smoothing. Aust N Z J Stat. 2016; 58(2):211–25.
 65
Gleit A. Estimation for small normal data sets with detection limits. Environ Sci Technol. 1985; 19(12):1201–6.
 66
Johnson BA, Long Q, Chung M. On path restoration for censored outcomes. Biometrics. 2011; 67:1379–88.
 67
Zhao SD, Lee D, Li Y. The Dantzig selector for censored linear regression models. Stat Sin. 2014; 24(1):251–68.
 68
DiRienzo AG. Parsimonious covariate selection with censored outcomes. Biometrics. 2016; 72:452–62.
 69
R Core Team. R: A language and environment for statistical computing. Vienna: R foundation for statistical computing; 2017. ISBN 3900051070, http://www.Rproject.org.
 70
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010; 33(1):1–22.
 71
Wang Z, Wang MZ, Suggests T. bujar: BuckleyJames regression for survival data with highdimensional covariates. 2015. R package version 0.21. https://CRAN.Rproject.org/package=bujar.
 72
Iyidogan P, Anderson KS. Current perspectives on HIV1 antiretroviral drug resistance. Viruses. 2014; 6(10):4095–139.
 73
Shafer RW, Schapiro JM. HIV1 drug resistance mutations: an updated framework for the second decade of HAART. AIDS Rev. 2008; 10(2):67.
 74
Johnson VA, BrunVézinet F, Clotet B, Gunthard H, Kuritzkes DR, Pillay D, Schapiro JM, Richman DD. Update of the drug resistance mutations in HIV1: December 2009. Top HIV Med. 2009; 17(5):138–45.
Acknowledgements
We acknowledge members of the Standardization and Clinical Relevance of HIV Drug Resistance Testing Project for the Forum for Collaborative HIV Research. We thank “Sidaction, Ensemble contre le Sida”, France, for their continuous support. We would like to thank Binbin Xu, postdoctoral researcher at SISTM research team from Inserm BPH U1219 & Inria BSO, for his help on testing the R code.
Funding
This work was partially supported by the “Investissements d’Avenir” program managed by the ANR under reference ANR10LABX77
Availability of data and materials
R code and artificial example are available at https://github.com/psBiostat/leftcensoredLasso.
Author information
Affiliations
Contributions
PS developed the algorithms and corresponding R code, carried out the statistical analysis and helped to draft the manuscript. MA developed the algorithms, revised the R code and drafted the manuscript. RT designed and supervised the applied research. RT and LW interpreted the results of the analysis. DC revised the methodology. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Soret, P., Avalos, M., Wittkop, L. et al. Lasso regularization for leftcensored Gaussian outcome and highdimensional predictors. BMC Med Res Methodol 18, 159 (2018). https://doi.org/10.1186/s1287401806094
Received:
Accepted:
Published:
Keywords
 Limit of detection
 BuckleyJames least squares procedure
 HIV viral load
 Drug resistance
 HIV genotypic mutations
 Crosssectional studies