Modeling Variables with a Detection Limit using a Truncated Normal Distribution with Censoring

When data are collected subject to a detection limit, observations below the detection limit may be considered censored. In addition, the domain of such observations may be restricted; for example, values may be required to be non-negative. We propose a regression method for censored observations that also accounts for domain restriction. The method finds maximum likelihood estimates assuming an underlying truncated normal distribution. We show that our method, tcensReg, outperforms other methods commonly used for data with detection limits such as Tobit regression and single imputation of the detection limit or half detection limit with respect to bias and mean squared error under a range of simulation settings. We apply our method to analyze vision quality data collected from ophthalmology clinical trials comparing different types of intraocular lenses implanted during cataract surgery.


Introduction
Censoring, in which the value of an observation is not known exactly but rather is only known to be above or below a specific value, is prevalent in many data settings. Censoring occurs with time-to-event data, but can also occur when measurements are subject to a detection limit (DL). A detection limit is defined as the lowest quantity or concentration of a compound that can be reliably detected with a given analytical method (Hornung and Reed, 1990). Quantities below the DL can be considered censored. Detection limits and the censored observations associated with them are encountered in epidemiology (Lubin, Colt, Camann, Davis, Cerhan, Severson, Bernstein, andHartge, 2004, Schisterman, Vexler, Whitcomb, andLiu, 2006), hydrology (Helsel, 1990), chemistry (Analytical Methods Committee, 1987), toxicology (Zaugg, Sandstrom, Smith, and Fehlberg, 1995), and economics (McDonald andMoffitt, 1980, Greene, 2018).
Estimation of the parameters of a normal distribution based on a sample with censored observations has a long history of investigation. Hald (1949) was one of the first authors to develop maximum likelihood estimation methods for this data setting. Much of the early work focused on single mean models (Hald, 1949, Gupta, 1952, Harter and Moore, 1966, Tiku, 1967 or estimation using order statistics (Sarhan andGreenberg, 1956, Dixon, 1960). Regression with a dependent variable subject to censoring gained prominence with the work of Tobin (1958), who developed the Tobit model, also called a censored regression or Tobit regression model, to estimate linear relationships between variables when there is censoring in the dependent variable. Linear regression methods with an unspecified censored distribution were developed by Buckley and James (1979).
In settings in which censored data arise due to a DL, estimation is sometimes performed by singly imputing the DL or 1/2 DL for observations below the DL. While these methods are known to yield biased estimates of the mean and standard deviation, they are still routinely applied due to convenience. Single imputation of the DL yields upwardly biased estimates of the mean (Helsel, 1990). There can also be substantial bias using 1/2 DL imputed values, with the direction of the bias depending on the underlying data mechanism (Hornung and Reed, 1990, Helsel, 1990, Lubin et al., 2004. Lubin et al. (2004) noted that the bias of parameter estimates when using 1/2 DL is substantial unless the proportion of censored observations is small, defined as less than 10%.
Related to but distinct from censoring is the concept of truncation. A truncated distribution is a conditional distribution that results from restricting the domain of some other probability distribution. For example, the zero-truncated Poisson distribution is the distribution of a Poisson random variable conditional on the value of the random variable being nonzero. Truncation is a strict restriction of the domain of the random variable; observations outside the domain cannot occur. In contrast, when an observation is censored, its true value is known to lie beyond the censoring threshold, and such true values are permitted to occur. Truncation is a property of the population, while censoring is a result of the sampling mechanism.
Methods have been developed for estimating the parameters of a truncated normal distribution. Hald (1949) developed methods for the single mean model, while later work by Cohen Jr (1950) provided solutions for settings involving normal distributions with known or unknown truncation points and single or double truncation. Halperin (1952) extended the theoretical framework to settings in which the distribution satisfies regularity conditions but is not necessarily normal. Work by Amemiya (1973) expanded the truncated regression framework by demonstrating the consistency and asymptotic normality properties of the maximum likelihood estimator and identifying consistent initial estimators.
Both censored and truncated data are often referred to as limited dependent variables in the economics literature (Tobin, 1958, Greene, 2018. There are currently methods of parameter estimation applicable to normal data that are censored and to data that are from a truncated normal distribution. However, estimation for the case in which both truncation and censoring are simultaneously generating the data, i.e. an underlying truncated normal distribution that is censored, are lacking. In this manuscript we propose maximum likelihood estimation techniques for such data. The methods can be seen as analogous to those of Tobin (1958) with the latent normal distribution replaced with a latent truncated normal distribution.
The paper is organized as follows. In Section 2, a motivating example from an analysis of visual quality data from clinical trials of intraocular lenses implanted during cataract surgery is introduced. In Section 3 the methods for maximum likelihood estimation are outlined. Subsection 3.1 motivates the problem from a distributional perspective by defining the assumptions and the mechanism for censoring and truncation and develops the framework for estimation of a single mean model. In Subsection 3.2, methodology is developed for linear regression with left censored data from a left truncated normal distribution. In Subsection 3.3, the regression framework is extended to allow for heteroskedastic variances. In Subsection 3.4, methods to iteratively solve the log-likelihood for the parameters of interest are discussed. In Section 4, the proposed methodology is compared to other methods in a simulation study. Section 5 applies the methods to our motivating example. Finally, Section 6 reflects on the significance of the results, discusses limitations and notes possible areas for extending the work.

Problem Motivation
Our application concerns left censored non-negative data arising from contrast sensitivity testing in clinical trials for intraocular lenses implanted during cataract surgeries.
Contrast sensitivity measures the visual quality experienced by a subject by testing his or her ability to distinguish increasingly finer increments of light versus dark. Being unable to distinguish objects when contrast is low, i.e., when there is little difference between light and dark, can make everyday tasks such as night driving, navigating new settings, or perceiving distances difficult (Owsley and Sloane, 1987). Cataract patients tend to have especially poor contrast sensitivity due to the clouding of the natural lens. The primary treatment for cataracts is cataract surgery, during which the natural clouded lens is removed and replaced with a new synthetical intraocular lens (IOL). The patient should see improvement in visual acuity and visual quality following IOL implantation. As such, contrast sensitivity is an important clinical outcome for patients who receive IOLs during cataract surgery.
Contrast sensitivity testing is performed using standardized charts with alternating light and dark bars, referred to as gratings. To determine the contrast sensitivity of a patient, the intensity of the contrast between the bars as well as the spacing is reduced until the patient is no longer able to perceive separate bars. The contrast is defined as the relative difference in luminance of the bars from the background and may be calculated using the Weber contrast, Lmax−Lmin L background , Michelson contrast, Lmax−Lmin Lmax+Lmin , or RMS contrast, Lσ Lµ , where L max , L min , L background , L µ , and L σ are luminance maximum, minimum, background, mean and standard deviation respectively (Pelli and Bex, 2013). Typically when using gratings to test contrast, the Michelson contrast is preferred. Contrast sensitivity is defined as the reciprocal of the threshold contrast, which is the lowest contrast to identify the grating. The spacing between the bars is measured in cycles per degree (CPD) with higher values of CPD indicating less space between bars. The testing is performed across a variety of different CPD levels, under either bright or dim lighting conditions, and with or without glare. In general, contrast sensitivity scores are lower when testing is performed under dim lighting with glare.
At each CPD level, the subject is presented with a sample grating followed by 8 gratings that progressively decrease the intensity of the image contrast. Figure 1 shows the testing setup used for 12.0 CPD, which is one of the most common visual quality outcomes analyzed in ophthalmic clinical trials.. The subject is first asked if they can identify the striped pattern in the sample image, and is subsequently shown each column starting from 1 and asked to identify whether the striped pattern is in the top, bottom, or neither grating. A contrast sensitivity score is recorded as the lowest level of identifiable contrast, i.e., the lowest intensity contrast for which the patient is able to correctly identify the striped pattern. The contrast sensitivity score ranges from 0 to 8, with 0 representing the ability to identify only the sample grating and 8 corresponding to the last column with the lowest image contrast. If a subject is unable to identify the sample grating, the value is recorded as -1. The scores of 0-8 are converted to continuous log contrast sensitivity values based on the manufacturers' recommendations as shown in Table 1. Note that by definition, the contrast sensitivity values using the Michelson formula range from [1, ∞), meaning that the log contrast sensitivity scores must be non-negative, i.e., [0, ∞).
Problems arise using this scoring approach when patients are unable to identify the sample grating, i.e., have contrast sensitivity threshold score of -1. The true contrast sensitivity values for these patients are only known to be below the sample threshold. This is equivalent to having left censored observations. However, we also know that contrast sensitivity values must be non-negative, implying that the population values are left truncated at zero. Therefore, to accurately estimate the mean and standard deviation of contrast sensitivity for a particular IOL, both the censoring and truncation need to be accounted for.
Our data come from two prospective clinical trials for IOLs implanted during cataract surgery. The first study, Clini-calTrials.gov Identifier NCT01510717, compared a monofocal lens and a multifocal lens in a double blind randomized parallel group study with bilateral IOL implantation, and the second study, ClinicalTrials.gov Identifier NCT01424189, compared two different multifocal lenses in a nonrandomized parallel assignment multi-center study again with bilateral IOL implantation. Our analysis will be restricted to data reported for binocular (both eyes open) testing, under dim lighting with and without glare. All observations were taken 6 months after surgery. Figure 2 shows the marginal histograms with kernel density smoothers of each IOL by glare condition. Visual inspection of the data suggest that it may be reasonable to assume an underlying normal distribution for the log contrast sensitivity scores, with scores being censored at the lower-bound detection limit.
The goal of the analysis was to estimate the difference in mean log contrast sensitivity between the monofocal and each multifocal lenses. Historically, monofocal lenses have provided patients with better contrast sensitivity, but multifocal lenses are often preferred by patients for visual acuity as they provide both distance and near vision with increased spectacle independence. A common clinical trial hypothesis is to test whether the multifocal lens is non-inferior to the monofocal lens with respect to contrast sensitivity. Based on regulatory guidelines, losses of 0.3 log units are considered to be clinically significant when they occur at two or more spatial frequencies (ISO 11979-7, 2018). The non-inferiority margin for contrast sensitivity is set at half of this clinically significant magnitude, i.e., loss of 0.15 log units. Our goal was thus to estimate pairwise differences in mean log contrast sensitivity between the monofocal lens and each multifocal lens to test for non-inferiority of visual quality.

Methods
This section first develops maximum likelihood estimation for a sample from a truncated normal distribution with censored observations, i.e., a single mean model. The methods are then extended to include a linear predictor for the mean to allow for linear regression applications. We then propose an extension to handle heteroskedastic variances. We focus on the setting in which the truncation and censoring occur only on the lower end of the distribution, i.e., left truncation and left censoring. The results can be generalized to right truncated and right censoring. Finally, the process of finding the maximum likelihood estimates is described using different optimization techniques.

Single Mean Model
As a first step, we obtain the likelihood function for censored data from a normal distribution. Assume a latent normal distribution with mean µ and variance σ 2 for the random variable X * . Following Tobin (1958), assume that the values of X * are left-censored at ν, ν ∈ R, to produce the random variable X defined as Here, censored observations are reported as the DL ν. The values of X represent the observed values in the sample, which are a partial representation of the values of X * . Assume that a total of n observations are independently drawn with n 0 observations censored, i.e., n 0 = n i=1 1 {xi=ν} , and n 1 observations uncensored, i.e., n 1 = n i=1 1 {xi>ν} . The likelihood function for such data is where φ(·) and Φ(·) denote the standard normal pdf and cdf and S 1 is the set of all uncensored observations. Therefore the log-likelihood is Maximum likelihood estimatesμ andσ can be found using iterative optimization techniques, such as the Newton Raphson algorithm, as discussed in Tobin (1958).
Now we replace the latent normal distribution above with a latent truncated normal distribution, truncated from below at the value a. Call this random variable Y * , which has the following pdf and cdf from Cohen Jr (1950): .
The distribution of Y * is a scaled version of a normally distributed random variable, obtained by dividing the pdf by the constant 1 − Φ( a−µ σ ) to obtain a proper probability density function that integrates to one. We will denote a latent truncated normal random variable with left truncation at a constant a ∈ R as where we assume that the truncation value a is known and therefore fixed. For non-negative variables, a = 0.
Note that the parameter µ denotes the mean of the underlying normal distribution prior to truncation, rather than the mean of the truncated normal distribution, which is (Greene, 2018). Throughout the paper we focus on the estimation of this underlying central tendency parameter µ rather than µ T N .
The log-likelihood for the truncated normal distribution with n independent observations drawn from this distribution is There is no closed form for the maximum likelihood estimates for µ and σ for the truncated normal distribution. However, Cohen Jr (1950) described methods for iteratively solving for the maximum likelihood estimates and Amemiya (1973) showed that the estimates were consistent and asymptotically normal.
Further suppose that the values of Y * are censored at ν > a, ν ∈ R, to produce the random variable Y , defined as where ν is a known constant. For example in the contrast sensitivity problem, the testing procedure has an inherent detection limit and cannot detect values below 0.61 log units. The log contrast sensitivity scores must also be nonnegative as discussed earlier in Section 2.
The pdf for the truncated random variable with censoring can be expressed as The first term of this equation captures censored observations, in which case the data are reported as the detection limit value ν and the cdf of the truncated normal is used to provide information for the likelihood. In the second term, the observation is not censored and we simply use the pdf for the truncated normal. Figure 3 shows an example of the probability density function for such a random variable.
Out of a total of n observations, again let n 0 be censored such that n 0 = n i=1 1 {yi=ν} , and n 1 be uncensored, n 1 = n i 1 {yi>ν} . Let S define the set of all observations, S 1 be the set of uncensored observations, and S 0 be the set of censored observations, i.e., S 0 ∪ S 1 = S. Assuming that the observations are drawn independently, the likelihood is Taking the log of the likelihood, we have Similar to the censored only and truncated only log-likelihoods, the maximum likelihood estimators for µ and σ for the truncated normal distribution with censoring do not have a closed form but can be estimated using the iterative process discussed in Section 3.4.

Linear Predictor for the Mean
The goal of many applications is to understand how certain predictors influence the mean or to compare the means of different populations. This can be accomplished by using the linear predictor X T i β, i.e., Again, note that X T i β is the mean of the underlying normal distribution rather than the mean of the truncated normal distribution, that is, All of the unknown parameters can be collected into the vector θ = (β, σ 2 ) T which has length p. The corresponding pdf and cdf are .
Suppose that this truncated normal distribution is then censored at the value ν, where ν > a. Let the notation denote a standardized version of the constant a. The likelihood of the truncated normal distribution with censoring can be expressed as

Heteroskedastic Variances
We now relax the assumption of homogeneous variance. We consider the case of independent groups with different variances. Assume we have samples drawn independently from J truncated normals, with each population having a common truncation value but possibly different variance, according to the model where n j is the number of observations in group j and Y * ij ⊥ ⊥ Y * i j for all i = i and j = j . Assume that observations are censored at the value ν to create a sample of independent random variables with pdf Because groups are independent, the log likelihood becomes where a * ij = a−x T ij β σj , S 0j and S 1j are the sets of censored observations and uncensored observations respectively in the j th group, and n 1j is the number of uncensored observations in the j th group.

Obtaining Maximum Likelihood Estimates
Our goal is to find the valuesθ that maximize the log-likelihoods of Equations 5, 6, and 7. However, closed form solutions do not exist. To obtain maximum likelihood estimates, an iterative procedure is required. One approach is to use the Newton Raphson algorithm using Taylor series expansion, as discussed in Chapter 14 in Lange (2010). While the Newton-Raphson method has attractive local convergence guarantees and reliable performance (Ypma, 1995), each step requires evaluation of the Hessian matrix, which can become computationally expensive for a large set of predictors. Alternative optimization routines, such as the quasi-Newton BFGS (Broyden, 1970, Fletcher, 1970, Goldfarb, 1970, Shanno, 1970 or the conjugate gradient (Fletcher and Reeves, 1964), which require only evaluation of the likelihood function and corresponding gradient, often require additional evaluations but have reduced memory and faster computing time.
Within R, other optimization packages such as the maxLik package from Henningsen and Toomet (2011) can also be used to find maximum likelihood estimates. This package is called in the censored only and truncated only maximum likelihood estimation packages in R such as censReg by Henningsen (2010) and truncreg by Croissant and Zeileis (2018).
We developed the standalone R package tcensReg to solve the novel likelihood equation of the truncated normal distribution with censoring. This software package uses analytic results of the gradient and Hessian for the corresponding model of interest in either Equation 6 or Equation 7 derived in Appendices A, B, C, and D. Several optimization routines are available within the software including conjugate gradient, Newton-Raphson, and BFGS. This package uses familiar model syntax and has additional functionality to estimate parameters for the censored only or truncated only settings similar to the censReg and truncreg packages, respectively.

Simulation Study
We conducted a simulation study to compare the performance of our method to that of five methods of estimating the mean and standard deviation of the underlying normal distribution from a truncated normal distribution with censored observations. Method 1 is the gold standard method which uses the true uncensored observations, and accounts for the truncation in the estimation procedure by using the appropriate truncated log-likelihood. Method 2 uses the same uncensored truncated data but does not adjust for truncation in the normal distribution likelihood function. Methods 3-6 use the censored truncated observations but differ in how they treat censoring and truncation. Method 3 imputes all censored values with the detection limit and uses maximum likelihood estimation with a normal distribution likelihood, while Method 4 imputes all values as half of the detection limit and also uses normal maximum likelihood estimation. Both the DL and 1/2 DL imputation methods have been shown to perform poorly in cases with censoring (Hornung and Reed, 1990, Helsel, 1990, Lubin et al., 2004, and are expected to show even worse performance in this setting since they do not account for censoring or truncation. However, they are still sometimes used in practice. Method 5 uses Tobit regression, which incorporates censoring into the likelihood. This method is often recommended when the assumption of normality seems reasonable. Finally, Method 6 is our proposed maximum likelihood estimation procedure described in Section 3.2 which takes into account not only the censoring but also the truncation of the underlying distribution. These six methods will be referred to as: 1) Gold Standard (GS), 2) Uncensored with no truncation adjustment (Uncens NT), 3) Detection Limit (DL), 4) 1/2 Detection Limit (1/2 DL), 5) Tobit regression (Tobit), and 6) censored regression with truncation adjustment (tcensReg), our proposed method.

Set-up
Three different sets of simulation studies were conducted to compare the six methods. In the first simulation study, values from a single mean model were drawn to compare performance of the methods in terms of bias and mean squared error (MSE) for estimating the mean and standard deviation of the underlying latent distribution. The second simulation study focused on estimating the difference of the means of two independent populations and their common variance. Finally, the third simulation study assessed their performance in a non-inferiority test setting similar to that of our motivating example. The simulations were conducted in R version 3.5.1(R Core Team, 2018).
For the first simulation study, values were simulated from a truncated normal distribution, and then censored. A constant value of a = 0 was used to represent zero-truncation. The values of µ and σ 2 were chosen to approximate the marginal distributions of the log contrast sensitivity scores from the application introduced in Section 2. Typical values of mean log contrast sensitivity ranged from 0.7 to 1.1, and thus we used µ ∈ {0.7, 0.8, 0.9, 1.0, 1.1}. Based on standard deviations observed in the contrast sensitivity data, for the simulation we used σ ∈ {0.40, 0.45, 0.50}. This created a total of 15 (5 × 3) parameter combinations for µ and σ 2 .
Observations from a truncated normal distribution, y ∼ TN(µ, σ 2 , a), can be simulated by transforming samples from a uniform distribution on the interval [0, 1] using the inverse probability transformation: where p represents the sample from the uniform distribution Burkardt (2014). This method of inverse transform sampling is implemented with the tcensReg software package. After transforming to the appropriate truncated normal distribution, another dataset was generated by censoring the observations at ν = 0.61. Values that fell below ν were either replaced with the DL (0.61), 1/2 DL (0.305), or marked as censored for Tobit and tcensReg estimation.
Each of the six estimation methods was used to estimate µ and σ. The data simulation and estimation procedures were repeated for B = 10, 000 replications. For each of the six methods, two performance metrics were calculated for θ ∈ {µ, σ}: average bias (θ − θ) and mean squared error (MSE; 1/B B k=1 [θ k − θ] 2 ) whereθ = 1/B B k=1θ k . In the second simulation study, we simulated values from two truncated normal populations with different means but common variance and truncation value, i.e., A constant value of a = 0 was used to represent zero-truncation and a range of values of µ 1 , µ 2 , and σ 2 were selected to produce data similar to the application. Population 1 approximated a monofocal intraocular lens, while Population 2 approximated a multifocal intraocular lens. Means for monofocal lenses took values µ 1 ∈ {1.0, 1.1}, and the difference between multifocal and monofocal lenses was set to range from no difference to a clinically significant difference of 0.3 log units (ISO 11979-7, 2018), i.e., δ ∈ {−0.3, −0.2, −0.1, 0}. Therefore the mean of Population 2 was set as µ 2 = µ 1 + δ. The common standard deviation was assumed to take values σ ∈ {0.40, 0.45, 0.50}. The two-population simulation had a total of 24 (2 × 4 × 3) parameter combinations for µ 1 , µ 2 , and σ 2 .
A total of 100 observations from each population were sampled. Again, for each population a separate censored dataset was generated using the censoring threshold ν = 0.61, and values that fell below ν were replaced with the DL (0.61), 1/2 DL (0.305) or marked as censored for Tobit and tcensReg estimation. Each of the six estimation methods was used to estimate the mean difference between Population 1 and 2, δ, and the common standard deviation, σ. Data simulation and estimation were repeated for B = 10, 000 replications. For each of the six methods, average bias and MSE were calculated.
The third simulation study assessed the performance of the methods in the context of a non-inferiority test. Here we focused on the Type I error rates of the various methods. Type I error rates are particularly important for non-inferiority tests because, if non-inferiority is falsely accepted, patients may decide between products based on non-efficacy factors such as price and side effects assuming the products to be similar when in fact one is truly superior. For the noninferiority test, data were simulated with a true difference of δ = −0.15 in the two population model, while varying µ 1 ∈ {1.0, 1.1} and σ ∈ {0.40, 0.45, 0.50} for a total of 6 different non-inferiority scenarios. Then each of the six methods was used to construct 1 − α confidence intervals for δ. The test is specified as one-sided because it is known that multifocal contrast sensitivity is less than monofocal contrast sensitivity. Constructing 1 − α confidence intervals and comparing the lower bound to the non-inferiority margin will result in a one-sided hypothesis test at the α level.
If the lower bound of the confidence interval does not cover the true value of δ, then the non-inferiority hypothesis would be falsely accepted. A total of 100 observations were drawn from each population to construct the confidence intervals. The hypothesis test was repeated for B = 10, 000 replications and the Type I error rate was calculated as the percent of replications where the lower bound of the 1 − α confidence interval was greater than −0.15.
The choice of initial starting values θ (0) is important since optimization algorithms can provide local rather than global convergence. To ensure that the starting values are reasonable, we recommend using initial estimates from a censored regression model. These estimates showed excellent rates of convergence for our simulation settings.

Results
Table 2 shows expected censoring and truncation percentages and the ratio of censoring to truncation for each of the 15 parameter value scenarios in the single mean model. The expected percent of censoring ranges from 10.8%-38.6% and truncation was typically ≤ 5% but was slightly higher when the mean was closer to the truncation value of 0, i.e., µ ∈ {0.7, 0.8}. The ratio of censoring to truncation varies from 4.68 to 35.87 where values greater than 1 indicate more censoring than truncation. In general, for a fixed µ, as σ increases the expected censoring and truncation increase while the ratio of censoring to truncation decreases.
Performance of the six methods in terms of average bias for µ is shown in Figure 4. The figure shows that the gold standard was essentially unbiased with only slight negative bias for low values of µ. The average bias of tcensReg was slightly negative for all scenarios, meaning that the estimated mean was smaller than the true value. The slight negative bias increased as the amount of censoring increased, i.e., as µ decreased for a fixed σ and as σ increased. However, the absolute bias remained small and was always below 2% of the true µ value. The Tobit, Uncens NT and DL methods consistently had positive bias, with the estimated mean consistently larger than the true value. The amount of bias also increased more rapidly for these other methods as censoring increased, compared to tcensReg. The 1/2 DL method had variable trends in bias with positive bias in some settings and negative bias in others. In particular, when σ = 0.40, the 1/2 DL method overestimated µ when µ ≤ 0.8 and underestimated it when µ > 0.8.
The tcensReg method was consistently closer to the gold standard compared to the Tobit method, especially in situations with high censoring, i.e., σ = 0.50. There were high censoring settings in which the tcensReg method and Tobit method performed similarly; for example, absolute bias was similar when µ = 0.9 and σ = 0.40, which had 22.47% expected censoring. For this scenario, the ratio of censoring to truncation is 18.42 meaning that there is more than 18 times more expected censoring than truncation. For cases where the ratio of censoring to truncation is low, the tcensReg method tends to outperform the Tobit method. When the ratio of censoring to truncation is greater, the Tobit method is comparable in terms of absolute bias to the tcensReg method, but with different directions of bias.
The precision of the estimates of the mean as reflected by their log MSE is shown in Figure 5. MSE was transformed to the log scale as there was a great disparity between the values for the DL point imputation method and the other five methods. Here, the average log MSE of each method should be compared to that of the gold standard. The DL method consistently had the highest log MSE of all of the methods. The tcensReg method had higher log MSE than the Tobit, 1/2 DL, and Uncens NT methods, suggesting that these methods were outperforming tcensReg. However, the average log MSE for these three methods often fell below that of the gold standard, which reflects an underestimation of the true sampling variance. In contrast, the tcensReg method always maintained a log MSE greater than or about equal to the gold standard. Overall, the tcensReg method avoided false precision. Figure 6 provides evidence that other methods such as Tobit, 1/2 DL, and Uncens NT also systematically underestimated the variance of the underlying distribution. The Tobit and Uncens NT always have negative bias, meaning that these methods underestimated σ, with this bias being especially pronounced when σ is large and µ is small. Similar to the bias for µ, the bias for the 1/2 DL method when estimating σ is variable with positive bias in low censoring scenarios and negative bias in higher censoring scenarios. Overall the average bias for the tcensReg method was generally closest to the gold standard with only slight overestimates of σ. However, we note that again the ratio of censoring to truncation appears to play an important role when comparing the Tobit and tcensReg methods. The Tobit method slightly outperforms the tcensReg method for scenarios with σ = 0.4. In these settings the censoring to truncation ratio is higher as shown in Table 2, meaning that censoring is expected to occur much more frequently than truncation. Figure 7 shows the log MSE for each method. Again, the tcensReg method protected against false precision, with log MSE slightly greater than that of the gold standard. The 1/2 DL method had artificially low MSE for almost all parameter scenarios, while the Uncens NT alternated between over-and underestimation. The Tobit method was often the closest method to the gold standard.
In the two population simulation study, the parameter of primary interest was the difference of the means, δ. Figure 8 shows the performance of each method with respect to average bias for δ. For all values of δ, the tcensReg and gold standard methods were essentially unbiased. The other four methods all had positive bias, corresponding to underestimation of the true difference in means. Similar to the single mean model, the greatest difference between the tcensReg method and the other methods occurred when the amount of censoring was greater, i.e., when µ 1 = 1.0 and |δ| is large for a fixed σ.
For all six methods, the average bias ofδ when δ = 0 was approximately 0. As our simulation study for the single mean model showed, bias generally increases as the mean decreases for fixed σ; see Figure 4. When the two means were equal, i.e., δ = 0, the biases for estimating each mean were also equal and thus there was no bias for estimating δ.
The results for log MSE in the two-population simulation study (Figure 9) show that tcensReg had log MSE close to the gold standard for all scenarios. All of the other methods, with the exception of DL, had log MSE below that of the gold standard, indicating false precision. The DL method had high log MSE when δ was large (and its estimates were highly biased), and artificially low log MSE when δ was small. Overall, the tcensReg method was more successful than the other methods in capturing the true variability in the estimation procedure.
Also of interest in the two-population model is the estimate for the common standard deviation, σ. Figure 10 shows the average bias ofσ for each method. Similar to the single mean model, all methods other than tcensReg tended to underestimate the true standard deviation, especially when σ was large. An exception was the 1/2 DL method, which alternated between over-and underestimating the variability of σ but generally performed worse as |δ| increased. Figure 11 shows that both the tcensReg and Tobit methods had similar precision in the two population model with respect to the estimation of σ, while the other methods, i.e., DL, 1/2 DL and Uncens NT generally performed considerably worse. Overall in the two-population study, the tcensReg method had lower average bias in estimating δ and σ along with consistent precision near the gold standard without the pitfalls of false precision.
Results for testing the non-inferiority hypothesis with α = 0.05 are shown in Table 3. For all scenarios, the tcensReg method had slightly higher Type I error rates than the gold standard and the Tobit method had slightly higher Type I error rates than the tcensReg method. The DL method had Type I error rates 2.5-4 times higher than the nominal value. The other single imputation method, 1/2 DL, had inflated Type I errors in the medium and high variance settings. Especially when censoring was higher, i.e., µ 1 = 1.0 and µ 2 = 0.85, the other methods had significantly higher Type I error rates compared to tcensReg.

Application
We now apply the methods to our contrast sensitivity application introduced in Section 2. The goal was to compare visual quality, measured as contrast sensitivity, for monofocal vs multifocal lenses implanted following cataract surgery, using data collected from two clinical trials. Table 4 shows the number of participants who received each type of IOL and the percent of patients whose observation was censored under each glare condition. Each group has over 150 patients; Multifocal 2 is larger than the others due to a 2:1 allocation ratio in that trial. As mentioned previously, the amount of censoring is greater with glare than without glare, with an average difference of 9% censoring. However, all groups have relatively high levels of censoring, ranging from 11%-29%.
Separate models were fit for each pairwise comparison between the monofocal lens and each multifocal lens with and without glare. Initially, each pairwise comparison was fit assuming the standard deviation for lens type was heteroskedastic. Figure 12 displays result for the 95% confidence interval of each groups standard deviation,σ j . The standard deviations for the multifocal lenses are sufficiently close to the monofocal lens for both glare conditions with a high level of overlap in the confidence intervals. This suggested that a common standard deviation for lens type is appropriate. In the subsequent analysis, each model was re-run assuming a common standard deviation in the two groups. The parameters of interest included the mean of each population (monofocal or multifocal), the mean difference between populations, and the common standard deviation. We compared four methods to estimate the parameters. DL imputation for censored observations, 1/2 DL imputation, Tobit regression and our censored truncated method, tcensReg.
We also tested whether the multifocal lens is non-inferior to the monofocal lenses with respect to contrast sensitivity. Based on regulatory guidelines, the non-inferiority margin for contrast sensitivity is set at -0.15 log units (ISO 11979-7, 2018). The non-inferiority test was one-sided test with α = 0.05, conducted by constructing a two-sided 90% confidence interval (CI) for δ and comparing the lower bound of the CI to the non-inferiority margin. To establish non-inferiority, the lower bound for the CI must be above the non-inferiority margin. Figure 13 shows 90% CIs for δ using the four methods. For all of the methods, the lower bound of the CIs are below the non-inferiority margin, for all three lens comparisons and both glare conditions. Thus using any of these methods, we would be unable to conclude that a particular multifocal lens is non-inferior to the monofocal lens in terms of contrast sensitivity. For each comparison, the tcensReg method had the largest estimate of |δ| and the DL method had the smallest estimate. The point estimates of δ from the Tobit and 1/2 DL methods were intermediate and similar.
The tcensReg method yielded the longest CIs (mean length = 0.164 log units). Mean CI lengths for the Tobit, 1/2 DL and DL methods were 0.158, 0.156 and 0.133 log units, respectively. The differences in CI length are a reflection of differences in estimates of σ, shown in Figure 14. The DL method had consistently low estimates of σ, while the tcensReg method had the highest estimated values of σ. The 1/2 DL and Tobit methods gave similar estimates intermediate between the other two methods.
Interpreting these results in light of our simulation studies, if the latent normality assumption holds for these data, we would expect the estimates from the tcensReg method to have the least bias. The other methods typically underestimated both the difference in means and the population standard deviation, which would tend to lead to confidence intervals that are more likely to falsely exclude the non-inferiority margin and thus to higher Type I error rates, especially when there is a high rate of censoring. The wider confidence intervals observed in Figure 13 from the tcensReg may better protect against Type I errors.

Discussion
In this manuscript, we developed a maximum likelihood method for estimating parameters from a truncated normal distribution when observations are subject to censoring. We have also developed the R package tcensReg to implement the method. We showed in simulations that our method has substantially less bias for estimating the mean and standard deviation of the underlying latent normal distribution than other commonly used methods for a range of simulation settings. Our method also had close to the nominal Type I error rate for non-inferiority testing, while single imputation with either the detection limit or half the detection limit and Tobit regression often had inflated Type I error rates when the censoring rate was high.
The single mean simulation study showed that as the levels of censoring and truncation increased, the bias for all other methods generally increased, often dramatically. As expected, point imputation of the detection limit was consistently the worst performing method, but even the Tobit method had large bias under certain conditions. An important factor when comparing the Tobit and tcensReg method in the single mean model was not only the raw levels of censoring and truncation but the ratio of censoring to truncation. For low censoring to truncation ratios, generally below 20, the tcensReg method outperformed Tobit, while high ratios tended to have similar performance in mean estimation and worse performance in standard deviation estimation. This trend may point to the scenarios where the censoring dominates truncation in terms of estimation and thus using the Tobit method can lead to more precise estimates, particularly of the standard deviation. It is important to note that the tcensReg method was the only method to have positive bias in estimation of the standard deviation while other methods were typically negatively biased. Positive bias in this sense leads to conservative hypothesis testing.
In the two population simulation study, we observed trends similar to those for the single mean model. The tcensReg method uniformly outperformed all other methods in estimating the average difference and standard deviation between the populations with respect to average bias. The greatest difference between the methods occurred when the censoring rate was high and the difference between the population means was large. Unlike the single mean model, the ratio of censoring to truncation did not appear to play a significant role in the accuracy of parameter estimation for the twopopulation method. Similar to the single mean model, the precision of alternative methods was also inaccurate, leading to values of the mean squared error that appeared to outperform the gold standard. Across all parameter scenarios, the tcensReg method had significantly lower average bias and did not fall victim to the false precision fallacy.
Results from the non-inferiority simulation study confirmed the trends observed in the previous simulation studies. Methods other than tcensReg tended to underestimate the true variability of the underlying data generating mechanism, which led to precise but biased results. This bias tended to increase as the censoring increased, and this combined with underestimated values of the true standard deviation led to substantially inflated Type I error rates. The tcensReg method consistently had estimates closest to the gold standard.
In the application, the main difference among the methods was a greater estimate of the difference between means when using the tcensReg method. The tcensReg method also had the widest confidence interval length due to higher estimates of the variance. The DL method results suggested that it would be the most likely method to lead to a finding of non-inferiority, but as shown in the simulation study, this method can have highly inflated Type I error rates. The 1/2 DL, Tobit, and tcensReg methods are more likely to capture the true difference and variability in the estimate, with the tcensReg method showing evidence that the true difference may be even greater than previously thought using either of the other two methods. In our particular application, none of these methods differed with respect to the ultimate conclusion of the non-inferiority hypothesis test. However, in cases in which non-inferiority is more marginal, the choice of method could make a difference.
The goal of the estimation methods presented here is to estimate the mean and standard deviation of the latent normal distribution rather than the mean and standard deviation of the truncated normal distribution. This affects the interpretation of the results. When censoring is less than 50 percent, the mean of the latent normal corresponds to the mode of the truncated normal. In many settings the mode of the truncated normal may be a more clinically meaningful measure of central tendency than the mean of the truncated normal. For example, in contrast sensitivity testing, it may be clinically relevant to understand what factors affect the mode rather than the mean in order to target where the greatest proportion of patients lie.
The methodology as presented makes a strong parametric assumption of normality. For studies with small to moderate sample sizes, checking the reasonableness of the normality assumption may be difficult and the assumption may be increasingly tenuous as the amount of censoring and truncation increase. While the unobserved data may not be exactly normal, a normality assumption at least reflects an assumption that the unobserved part of the distribution has a monotone decreasing shape, which is frequently a reasonable assumption. In general, whenever an analysis involves unverifiable assumptions, conducting a sensitivity analysis is prudent. The tcensReg method can be viewed as an option in the toolkit of the statistician that can be used as part of a sensitivity analysis, which might also include Tobit regression.
Another potential limitation is that the detection threshold and truncation value are assumed to be known. While assuming that the detection threshold is known is often reasonable, the truncation value is not observable in the data due to the censoring and so must be incorporated in the analysis based on subject matter expertise.
In this manuscript, we have focused on the setting of left censoring and left truncation. The methods can be easily extended to handle right censoring with right truncation. Future work could extend the methods to handle censoring in both tails of the distribution. In our application, we noted some suggestion of censoring due to an upper detection threshold; see Figure 2. Conducting the analysis adjusting for left and right censoring could potentially improve the accuracy of the estimates.
Other possible extensions include extension to other non-normal parametric distributions. Also, this framework for estimation could be extended to linear mixed effect models with repeated measurements using the extended Newton-Raphson technique proposed by Lindstrom and Bates (1988).

Acknowledgments
This methodology was developed in part during an internship program at Alcon Laboratories, Inc. (Fort Worth, TX, USA). Thank you to the Biostatistics team at Alcon for their feedback and suggestions during the development of this manuscript. R code for generating data used in the simulations discussed in Section 4 is available at https: //github.com/williazo/tcensReg/tree/master/inst/script. Data used for the application in Section 5 is proprietary to Alcon Laboratories, Inc.

Financial disclosure
None reported.

Conflict of interest
Hyung-Woo Kim is a former employee of Alcon Laboratories, Inc.  .64 Note that censoring threshold set at ν = 0.61 and truncation value at a = 0. * This column compares the expected percent of censoring to the expected percent of truncation, e.g., when µ = 1.1 and σ = 0.50 we expect 10.91 times more censoring than truncation. nominal α set to 0.05 1 GS = Gold Standard, i.e. uncensored observations with truncation adjustment 2 Uncens NT = Uncensored data with no truncation adjustment 3 DL = detection limit 4 Tobit = Tobit censored regression with no truncation adjustment 5 tcensReg = Censored regression with truncation adjustment  Log contrast sensitivity scores are converted from contrast sensitivity threshold scores via Table 1. Detection limit for 12 CPD occurs at ν = 0.61. Histogram is shown in the background with Gaussian kernel density estimate in the foreground with bandwidth set to 0.2.

Figure 3: Truncated Normal Distribution with Censoring
Potential density for a left truncated normal distribution with left censoring. The density above was created with µ = 0.8, σ = 0.5, a = 0, and ν = 0.61. In our application, a = 0 and ν = 0.61.   The vertical dashed black line corresponds to the case when δ = 0, i.e., µ 1 = µ 2 . GS = Gold Standard, i.e. uncensored observations with truncation adjustment Uncens NT = Uncensored data with no truncation adjustment DL = detection limit Tobit = Tobit censored regression with no truncation adjustment tcensReg = Censored regression with truncation adjustment The vertical dashed black line corresponds to the case when δ = 0, i.e., µ 1 = µ 2 . GS = Gold Standard, i.e. uncensored observations with truncation adjustment Uncens NT = Uncensored data with no truncation adjustment DL = detection limit Tobit = Tobit censored regression with no truncation adjustment tcensReg = Censored regression with truncation adjustment The vertical dashed black line corresponds to the case when δ = 0, i.e., µ 1 = µ 2 . GS = Gold Standard, i.e. uncensored observations with truncation adjustment Uncens NT = Uncensored data with no truncation adjustment DL = detection limit Tobit = Tobit censored regression with no truncation adjustment tcensReg = Censored regression with truncation adjustment The vertical dashed black line corresponds to the case when δ = 0, i.e., µ 1 = µ 2 . GS = Gold Standard, i.e. uncensored observations with truncation adjustment Uncens NT = Uncensored data with no truncation adjustment DL = detection limit Tobit = Tobit censored regression with no truncation adjustment tcensReg = Censored regression with truncation adjustment

A Formula for the Gradient
In the model discussed in Section 3.2, the parameters to be estimated are θ = (β, σ) T where β = (β 1 , . . . , β p−1 ) T from Equation 6. The single mean model is a special case of the linear equation where X T i β = µ. Investigating the log likelihood from Equation 6, it is evident that calculating the partial derivatives with respect to ln(σ) rather than σ is optimal. Then to obtain estimates and standard errors for σ, one can apply the inverse transformation and delta method.
Note that another possible parametrization using one-to-one functions is δ = β σ and η = 1 σ used by Tobin (1958). However, we present results for the derivation using β and ln(σ) analogous to methods used by Henningsen (2010) in the R package censReg.

Throughout this derivation, let c
for some constant c. The following properties will be used to calculate the appropriate derivatives: We define the gradient as Taking the derivative of the log-likelihood with respect to β k , x ik (y i − x T i β) for k = 1, . . . , p − 1.
Then taking the derivative with respect to ln(σ), Therefore, the gradient vector is

B Formula for the Hessian
The Hessian is the matrix of second derivatives,

C Gradient for Heteroskedastic Model
Assuming that there are samples from J independent populations as discussed in Section 3.2, the parameters to be estimated are θ = (β, σ 1 , . . . , σ J ) T where β = (β 1 , . . . , β p−1 ) T . Again, the form of log likelihood from Equation 7 suggests calculating the partial derivatives with respect to ln(σ j ) rather than σ j is optimal. Let c * ij = c−x T ij β σj for some constant c.
We define the gradient as , for p ≥ 2 and J ≥ 1.
Note that the case where J = 1 is equivalent to the case in Appendix A.