 Research Article
 Open Access
 Published:
Assessing the effect of genetic markers on drug immunogenicity from a mechanistic modelbased approach
BMC Medical Research Methodology volume 20, Article number: 69 (2020)
Abstract
Background
With the growth in use of biotherapic drugs in various medical fields, the occurrence of antidrug antibodies represents nowadays a serious issue. This immune response against a drug can be due either to preexisting antibodies or to the novel production of antibodies from Bcell clones by a fraction of the exposed subjects. Identifying genetic markers associated with the immunogenicity of biotherapeutic drugs may provide new opportunities for risk stratification before the introduction of the drug. However, realworld investigations should take into account that the population under study is a mixture of preimmune, immunereactive and immunetolerant subjects.
Method
In this work, we propose a novel test for assessing the effect of genetic markers on drug immunogenicity taking into account that the population under study is a mixed one. This test statistic is derived from a novel twopart semiparametric improper survival model which relies on immunological mechanistic considerations.
Results
Simulation results show the good behavior of the proposed statistic as compared to a twopart logrank test. In a study on drug immunogenicity, our results highlighted findings that would have been discarded when considering classical tests.
Conclusion
We propose a novel test that can be used for analyzing drug immunogenicity and is easy to implement with standard softwares. This test is also applicable for situations where one wants to test the equality of improper survival distributions of semicontinuous outcomes between two or more independent groups.
Background
Biopharmaceuticals products (BP) such as therapeutic monoclonal antibodies are nowadays a fastgrowing class of drugs whose recent use in clinic has represented a critical step forward in the treatment of many severe autoimmune diseases. Nevertheless, for some patients these BP induce an activation of the immune system, leading to the formation of antibodies against the drug. The consequences range from transient appearance of antidrug antibodies (ADA) without any clinical significance to severe loss of efficiency by either blocking the drug or enhancing the clearance [1].
The mechanisms leading to biotherapy immunogenicity can either be patientrelated (e.g: genetic background, immunological status) or treatmentrelated (e.g: drug characteristics and formulations) but their relative contributions to the development of ADA is currently not fully understood and still remain to be deciphered. If major achievements for minimizing productrelated factors involved in immunogenicity have been recently made, thanks to the remarkable progress in biopharmaceutical engineering, there is still an urgent need for identifying nonmodifiable patientrelated factors that may provide a basis for stratified or personalized therapeutic approaches. However, if an extensive research has been conducted to study the immunogenic potential of the biotherapies, less has been done to identify patients who are either at high or low risk for ADA development. In this search for patientrelated predictive factors of immunogenicity, the genetic diversity in immune regulatory genes, is supposed to play a major role in the development of ADA [1, 2]. If early studies about drug immunogenicity assessment have mainly relied upon responsebased endpoints, timetoevent analyses are more and more often recommended for taking into account the dynamic of ADA production. For such studies, subjects that have not been previously exposed to a particular BP are followed up for a certain period of time after the first BP administration. The main outcome is the first time of ADA detection after the initial drug administration and the objective is to identify factors that are related to these timetoevents [3, 4].
The motivation behind this work is that such timetoevent analysis is not straightforward as it should take into account that the population under study is usually a mixture of preimmune, immunereactive and immunetolerant subjects. Here, the socalled preimmune subjects are those with preexisting antibodies. These preexisting ADA have been observed mainly among patients with autoimmune diseases and can originate either from the innate immune system or from the adaptive immune responses to homologous ingredients [5]. The preimmune status of a subject can be known by doing a screening test before the first administration of the drug. In contrast, the socalled immunetolerant subjects are those whose immune system is in a state of unresponsiveness to the exposure of the drug and thus will not produce ADA. Finally, the socalled immunereactive subjects are those who have no preexisting ADA and who are able to produce detectable levels of antibodies. Their time to ADA detection depends on the dynamic of the B cell clones production. However, for subjects who are not preimmune and are censored during the study, we cannot determine their status as immunereactive or immunetolerant subjects.
Thus, for such analysis, we have to deal with a particular kind of semicontinous data [6, 7] with nonnull proportions of zero and of infinite values coming from the preimmune and immunetolerant subjects, respectively. As a result, we have to use specific models that belong to the class of twopart models [6] but need to be rethought to incorporate a defective (or improper) distribution. Such defective distributions have been mainly considered in oncology for analyzing early stage of cancer. In the literature, there are two main approaches, the oldest one assumes that the population under study is composed of two subpopulations of patients: those who will not experience the event of interest and those who are likely to experience the event of interest during the followup. This kind of twocomponent mixture model incorporates the defect fraction in a parametric or semiparametric framework (for a review see Maller and Zhou [8]). An alternative approach relies upon defining the cumulative hazard as a bounded increasing positive function [9]. This modeling which has a meaningful biological interpretation in clinical oncology belongs to the class of promotion time cure models [10].
In this work, we propose to consider a novel twopart semiparametric improper survival model that can cope with semicontinuous time to event data and from which we can derive an efficient test statistic. Here, the timetoevent is zero for preimmune subjects with a discrete probability mass, whereas its nonzero values (immunereactive or immunetolerant subjects) have an improper survival distribution. This latter improper survival distribution relies upon some biological understanding of the ADA production
The paper is organized as follows. We first present the proposed twopart survival model with its interpretation. Then, we derive a general score test for the null hypothesis of a same timetoevent distribution across the different genotypes. Next, we present the results of simulation experiments comparing the behavior of our proposed score test to a twopart logrank test. Then, the clinical relevance of using the proposed test is exemplified by the analysis of the association between genetic markers and the occurrence of ADA among a cohort of treatmentnaive patients suffering from autoimmune diseases and treated by biotherapies [11].
Methods
Modeling background
Notation
Let G denote the genotype for a biallelic marker. Here, being homozygous for the common variant is noted as G=[AA] and corresponds to the reference group. Being heterozygous and homozygous for the alternative allele are noted as G=[Aa] and G=[aa], respectively.
As the underlying genetic model is unknown, two dummy variables G_{1} and G_{2} are created for investigating various genetic models, including : dominant, additive, recessive and overdominant. In practice, G_{1}=0,1,2 when G=[AA],[Aa],[aa] respectively and G_{2}=1 when G=[Aa] and 0 otherwise.
Thus, if we denote ξ_{1} and ξ_{2} the effects associated with G_{1} and G_{2} respectively (see Table 1), we have a dominant genetic model when ξ_{1}=ξ_{2}, an additive genetic model when ξ_{1}≠0 and ξ_{2}=0, a recessive genetic model when ξ_{2}=−ξ_{1}, and an overdominant genetic model when ξ_{1}=0 and ξ_{2}≠0.
In the following, we assume that the status for being a preimmune subject can be determined before the first administration of the biotherapy. Let Z be the preimmune status variable such as Z=1 if the subject is preimmune and 0 otherwise.
Let denote T the timetoADA detection and C the censoring time. We assume that T and C satisfy the condition of independent and non informative censoring [12]. For each subject i (i=1,...n), X_{i}=min(T_{i},C_{i}) denotes the observed time of followup and \(\delta _{i}=1_{(X_{i}=T_{i})}\phantom {\dot {i}\!}\) the indicator of ADA detection. We also denote \(Y_{i}(t)= 1_{(t \leq X_{i})}\phantom {\dot {i}\!}\) the indicator of being at risk for the event at time t. Here, for Z=1 then T=0 and for Z=0 then T>0. For each patient i, the observed data consist of \({(X_{i},\delta _{i},Z_{i},G_{1_{i}},G_{2_{i}})}\phantom {\dot {i}\!}\).
Twopart improper survival model
We define the survival distribution S^{⋆}(t) for these semicontinuous data such as:
S^{⋆}(t;z)=π^{z}×[(1−π)S(t)^{1−z}]
where S(t) is the conditional survival function for the nonzero values of the timetoevent and π is the probability for the zero values. Here, the survival function S(t) is improper and its limiting value S(∞)=a with 0<a<1 is called the tail defect and represents the probability of not experiencing the event of interest.
In this work, we assume that Z follows a logistic regression model that depends upon the genotype. For the sake of simplicity and without loss of generality, we assume a simple additive genetic model. Thus we have:
\(\pi (G_{1},G_{2})=\frac {e^{(\theta +\beta G_{1})}}{1+e^{(\theta +\beta G_{1})}}\)
where β is the unknown regression coefficient and θ is the intercept.
For the reference group, \(\pi (G_{[AA]}) = \pi _{0}= \frac {e^{\theta }}{1+e^{\theta }}\). When β=0, the proportions of preimmune subjects are identical across the different genotypes.
For the conditional improper survival distribution S(t) we introduce a new semiparametric model which is based on immunological mechanistic considerations and is presented in the next section.
Conditional improper survival distribution
We propose to model the distribution of the time to ADA detection through a simplified mechanistic immunological model whereby each non preimmune subject may or may not be able to produce ADA in response to the introduction of the biotherapy. From a biological perspective, it arises from the activation of unobservable BPspecific (Tdependent) Bcell clones that emerge and become immunereactive ADAproducing clones. At the cellular level, each Bcell clone produces antibodies with a unique antigenbinding site with various levels of ADA affinities. Antibody affinity describes the strength of interaction between the antibody combining site and the relevant antigenic determinants in the drug. It represents an important qualitative parameter of the immune response. Positivity occurs as soon as any one of the Bcell clones is able to produce levels of ADA of sufficient amount and/or affinity for being detected by the assay.
Since Bcell clones are not directly observed, we assume a latent discrete probability distribution for the number of Bcell clones and a continuous distribution for their timetodetection. From those latent distributions, we can deduce the marginal (or population) survival distribution of the time to ADA detection.
More precisely, let K (K=0,…,∞_{+}) denote a latent random variable that represents the number of (unobservable) Bcell clones. Let the random variable T^{j}, associated to the j^{th} latent Bcell clones, be the timetodetection with corresponding timetoevent survival function \(\phantom {\dot {i}\!}A_{j}(t)=A_{0}(t)^{U_{j}}\). Here, A_{0}(t) is a baseline decreasing function such as 1≥A_{0}(t)≥0 and U_{j} is a random variable that represents ADA affinity for the j^{th} clone.
Here, we suppose that K is distributed with probability mass function Φ and is supposed to be independent of T^{j}. We also supposed that the U_{j} are independent and identically distributed with density function Ψ and independent from K.
For patient i with k latent Bcell clone, let denote T_{i}=min_{0≤j≤k}(T^{j}) the timetodetection of the earliest Bcell clone. The conditional (patientspecific) survival function is expressed as:
\(S(t \mid K=k,(U_{1}=u_{1},...,U_{k}=u_{k}))=\Pr \left (T_{i} >t\right) =\Pr \left (T^{1}>t,\ldots,T^{k}>t\right) =A_{0}(t)^{1_{\{k \neq 0\}}\sum _{l=1}^{l=k} u_{l}}=A_{0}(t)^{1_{\{k \neq 0\}}w_{k}}\)
where \(w_{k}=\sum _{l=1}^{l=k} u_{l}\) is a realization of the random variable W_{k} which is the randomly stopped sum of independent random variables [13] U_{l} and 1_{{k≠0}}=1 when k≠0 and 0 otherwise. This can be interpreted as a firstactivation scheme model [14].
In the following, we assume a Katz distribution for the distribution of the number of latent Bcell clones (Φ) and a Gamma distribution Γ(1,τ) with shape parameter unity and scale parameter τ (τ≥0) for the clonal ADA affinity (Ψ). As a consequence, the variable W_{k} has a gamma distribution Γ(k,τ) (denoted in the following as Ψ_{(k,τ)}). We also introduce H_{0}(t) a positive nondecreasing function so that A_{0}(t)= exp{−H_{0}(t)}.
Based upon these latter assumptions, the marginal (population) survival function is given by:
\(S(t)=\sum _{k=0}^{\infty _{+}} \big \{\int _{0}^{\infty _{+}} A_{0}(t)^{s}\Psi _{(k,\tau)}(s)ds \big \} \Phi (k)\)
We recall that the Katz family [13] is the set of counting distributions for nonnegative integers whose probability mass function satisfies the following firstorder recurrence formula such as: \(\Pr (x+1)=\Pr (x) \times \left (\frac {\omega +\gamma x}{1+x}\right);\text { }x=0,\ldots,\infty _{+} \) where ω>0 and γ<1. If ω+γx<0, then Pr(x+j) is equal to zero for all j>0. The mean and variance are \(\mu = \frac {\omega }{1\gamma }\) and \(\sigma ^{2} = \frac {\omega }{(1\gamma)^{2}}\). Moreover, it is worth noting that \(\omega = \frac {\mu ^{2}}{\sigma ^{2}}\) and γ is linked to the dispersion index such as \(\frac {\sigma ^{2}}{\mu } = (1\gamma)^{1}\).
The probability generating function can be written such as [13]:
This family of distribution covers a wide spectrum of discrete distributions that encompasses the binomial, the negative binomial and the Poisson distributions. If γ=0, we have a Poisson distribution (equidispersion) with S(t=∞_{+})=e^{−ω}. When γ<0, we have underdispersion with S(t=∞_{+})>e^{−ω}. When γ∈[0,1], we have overdispersion with S(t=∞_{+})<e^{−ω}.
In the following, we assume that the distributions Φ and Ψ depend upon the genotype. In practice, we assume that for the reference group [AA] the latent distribution Φ is the Poisson distribution (γ=0) and Ψ is the Gamma distribution Γ(1,τ). In the following, the alternative genetic variant, acting in a dominant, overdominant, additive or recessive way, is associated with changes either to the distribution of the number of clones (over/underdispersion) or to the dispersion in ADA affinity (increase/decrease of τ).
Thus, the marginal survival function for the non preimmune subjects depends upon the genotype such as, for the reference group:
S(t;G_{1}=0∩G_{2}=0)= exp{−ω[1−(1+H_{0}(t)τ)^{−1}]}
with ω>0. For the other genotypes, we have:
where α_{1},α_{2},γ_{1},γ_{2} are the unknown regression coefficients. The parameters γ_{1},γ_{2} are linked to the distribution of the number of the Bcell clones and α_{1},α_{2} are linked to the dispersion in ADA affinity.
When γ_{1}=γ_{2}=0 the improper survival distributions for each genotypes have a same defect such as S(t=∞_{+};G_{1},G_{2})=e^{−ω}. The biological interpretation is that the alternative allele has no effect on the proportion of immunetolerant. When γ_{1}≠0 or γ_{2}≠0, the improper survival distribution varies between genotypes. The biological interpretation is that the alternative allele is associated with changes in the distribution of the number of clones and consequently with changes in the proportion of immunetolerant.
We write the improper survival model in terms of the hazard functions and take its firstorder Taylor expansion of γ_{1}=γ_{2}=0.
where αG=α_{1}G_{1}+α_{2}G_{2}, γG=γ_{1}G_{1}+γ_{2}G_{2}, k_{0}(t)=ωτh_{0}(t) is a baseline hazard function with \(h_{0}(t)=\frac {\partial H_{0}(t)}{\partial t}\) and Υ(t) is a function of γ_{1};γ_{2};α_{1};α_{2};G_{1};G_{2}. This multiplicative hazard formulation will be used in the following.
The proposed statistic
In this work, the general null hypothesis to be tested is defined by H_{0}:β=γ_{1}=γ_{2}=α_{1}=α_{2}=0 and corresponds to a same survival distribution across genotypes against differences that can be associated to changes linked either to the preimmune fraction (β≠0) or to the improper survival distribution for the non preimmune subjects. From our mechanistic model, these differences are related either to the dispersion of the latent distribution for the Bcell clones (γ_{1}≠0 or γ_{2}≠0) or to the ADA affinity (α_{1}≠0 or α_{2}≠0).
In the following, we derive under the null hypothesis a score statistic for testing H_{0}.
The loglikelihood \(LL_{0}\left (\beta,\gamma _{1},\gamma _{2},\alpha _{1},\alpha _{2};G_{1_{i}},G_{2_{i}},Z_{i}\right)\) derived under the working model can be expressed as the sum of two terms LL_{0}=LL_{1}+LL_{2} where the first term involves only the parameter β whereas the second term involves the parameters γ_{1},γ_{2},α_{1};α_{2}.
\(LL_{1} = \sum _{i=1}^{n}Z_{i}\left \{\theta +\beta G_{1_{i}}log\left (1+e^{(\theta +\beta G_{1_{i}})}\right)\right \} + (1Z_{i}) \left \{log\left (1+e^{(\theta +\beta G_{1_{i}})}\right)\right \}\)
\(LL_{2}= \sum _{i=1}^{n} (1Z_{i})\delta _{i} \left [log(k_{0}(t))+\Upsilon (t)\right ]  (1Z_{i}) \int _{0}^{\infty } k_{0}(s) e^{\Upsilon (s)}ds\)
If we consider the nonparametric form for k_{0}(t) that puts point mass k_{0j} at the J ordered failure times t_{(j)}, then \(K_{0}(t)=\sum _{j=1}^{J} k_{0j}(t_{(j)} \leq t)\), as our improper survival model is a multiplicative model, the profile likelihood (LL2) (profile out K_{0}) leads to the classical log partial likelihood function from which a score test can be derived under the null hypothesis [15, 16]. Here the logpartial likelihood is: \(LPL_{2}= \sum _{i=1}^{n} (1Z_{i}) \delta _{i} \Upsilon (t_{i})  \sum _{j=1}^{n}Y_{j}(t_{i}) (1Z_{i})e^{\Upsilon (t_{i})}\).
Thus, one can obtain two score statistics by separately computing the components of the score statistics derived from the first derivatives of LL_{1} and the log partial likelihood LPL_{2} with respect to β and α_{1},α_{2},γ_{1},γ_{2} evaluated under H_{0}, respectively.
The first score statistic is as follows:
\(\hat {V}_{H_{0},\beta }=\frac {\partial LL_{1}}{\partial \beta } = \sum \limits _{i=1}^{n}G_{1_{i}}\left \{Z_{i}\pi _{0} \right \}\)
with \(\pi _{0}=\frac {e^{\theta }}{1+e^{\theta }}\) the probability of being a preimmune under the null hypothesis H_{0}.
The components of the second score statistic are such as:
where \(\phantom {\dot {i}\!}W_{1}(t)=12\big (\frac {\Lambda (t)}{\omega }\big)\) and \(\phantom {\dot {i}\!}W_{2}(t)=1\big (\frac {\Lambda (t)}{\omega }\big)\) with Λ(t) as the bounded cumulative hazard function under the null hypothesis H_{0}.
For computing these two score statistics, we should substitute Λ(t), ω and π_{0} by efficient estimators computed under the null hypothesis. Here \(\hat {\pi _{0}}\) is the observed proportion of preimmune subjects computed under the null hypothesis. The quantity \(\phantom {\dot {i}\!}\hat {\Lambda }(t)=\sum _{i=1}^{n}\int _{0}^{t}\{\sum _{k=1}^{n}Y_{k}(s)\}^{1}dN_{i}(s)\), where \(\phantom {\dot {i}\!}N_{i}(t)=1_{\{X_{i}\leq t,\delta _{i}=1\}}\) is the leftcontinuous version of the NelsonAalen estimator for the cumulative hazard [15, 17] obtained by using the pooled sample, and \(\hat {\omega }=\hat {\Lambda }(t_{\max })\) is the maximum value of this estimator computed at the last observed failure time t_{max}. Here, \(\hat {\omega }\) is an efficient estimator of ω since we assume that the immunereactive patients should experience the event within the oneyear followup.
For the two scores, their corresponding information scalar and matrix, respectively \(\hat {J}_{H_{0}}\) and \(\hat {I}_{H_{0}}\), are obtained as presented in the Supplementary material. They are are computed by using efficient estimators of W_{1}(t) and W_{2}(t) as given above. Then,
\(S_{1}=\hat {V}_{H_{0}}{\hat {J}_{H_{0}}}^{1}\hat {V}_{H_{0}}\) and \(S_{2}=\left (\hat {U}_{H_{0},\alpha _{1}},\hat {U}_{H_{0},\alpha _{2}},\hat {U}_{H_{0},\gamma _{1}},\hat {U}_{H_{0},\gamma _{2}}\right) \hat {I}_{H_{0}}^{1}\\ \left (\hat {U}_{H_{0},\alpha _{1}},\hat {U}_{H_{0},\alpha _{2}},\hat {U}_{H_{0},\gamma _{1}},\hat {U}_{H_{0},\gamma _{2}}\right)^{^{\prime }}\).
Thus, the statistic : \(S_{H_{0}}=S_{1}+S_{2}\) is asymptotically distributed under H_{0} as a chisquare with five degrees of freedom. This test statistic is referred as TWIST, for TWopart Improper Survival Test in the following
Statistical test comparison
We compared our proposed test to a Twopart LogRank Test, hereafter referred as TLRT. Here, the TLRT is derived in the same spirit as proposed by Lachenbruch (2001)[7] where the first part is the score test under the logistic model and the second part is a ksample logrank test. Here, k=3 for the three genotype groups.
Results
Data generation
Monte Carlo simulations were performed in order to evaluate the behavior of the TWIST. We reported the size of the proposed test, as well as its power properties with those obtained with a TLRT. Individual’s genotype information was generated by summing the values of two Bernoulli variables independently drawn, with mean: 0.1, 0.2 and 0.3. This value corresponds to a pseudominor allele frequency (MAF), that is to say the proportion of [a] allele in the simulated population.
The status for being preimmune was sampled from a Bernoulli variable, with mean of 0.05, 0.1 or 0.2. The values for β are chosen so that the preimmune fractions are equal across the different groups (then β=0), or different according to the group (\(\beta = log(2), log(\frac {1}{2})\)).
For the non preimmune and to evaluate the impact of departures from the reference latent distribution, we have considered scenarii with equi, under or overdispersed latent distributions corresponding to Poisson, Binomial and Negative binomial distributions, respectively.
Data were generated such as, for the reference group with [AA] genotype, S_{AA}(t)= exp[−ω(1−(1+t)^{−1})] and for the two other genotypes [aa] and [Aa], \(S(t) = \left [\frac {1(\gamma _{1} G_{1} + \gamma _{2} G_{2})\left (1+ t e^{\alpha _{1} G_{1} + \alpha _{2} G_{2}}\right)^{1}}{1(\gamma _{1} G_{1} + \gamma _{2} G_{2})}\right ]^{\frac {\omega }{\gamma _{1} G_{1} + \gamma _{2} G_{2}}}\).
The tail defect for the reference group was chosen such as: S_{AA}(∞_{+})=e^{−ω}=0.50. We also investigated the case where S_{AA}(∞_{+})=e^{−ω}=0.30.
The values for γ_{1} and γ_{2} were chosen accordingly to have over, equi or underdispersion, under an additive, dominant, recessive or overdominant genetic model. The values for α_{1} and α_{2} were chosen in such a way as to the risk \(e^{\alpha _{1} G_{1} + \alpha _{2} G_{2}} = 2\phantom {\dot {i}\!}\). Thus, when the genetic model is dominant, α_{1}=0.5×log(2) and α_{2}=0.5×log(2); when the genetic model is additive, α_{1}=0.5×log(2) and α_{2}=0; when the genetic model is recessive, α_{1}=0.5×log(2) and α_{2}=−0.5×log(2) and when the genetic model is overdominant, α_{1}=0 and α_{2}=log(2).
According to the model, α and γ have the same directional effects when they are of opposite signs. Thereby, when γ<0 and e^{α}>1 there is a global underdispersion, and the reverse for γ∈]0−1] and e^{α}<1.
The censoring time C was generated from a uniform distribution with parameter value computed from the chosen percentage of censoring. The percentage of censoring refers only to the percentage of censored observations without the tail defect fraction. We investigated no censoring and 20% censoring.
The total number of subjects was chosen to be 1,000 and for each configuration of parameters, 1,000 replications were performed. The levels and powers of the TWIST and the TLRT were estimated with a 0.05 significance level.
Simulation results
The estimated level of the TWIST under its proper null hypothesis fell within the binomial range [0.036−0.064] (see the underlined values in Table 2b and in Tables S1 to S6 in the Supplementary material).
Just below, we comment the results of Table 2. It displays the estimated power gains in the TWIST and the TLRT for different values of the parameters α, β and γ, for a 10% proportion of preimmune subjects and a minor allele frequency of 20%. In Table 2, both α and γ have additive effects, hence, α_{2}=0 and γ_{2}=0. The results for other genetic models are available in the Supplementary material.
We first comment the configurations where α has no effect (e^{α}=1, Table 2b). In this case, the TWIST’s power is higher than the TLRT’s when γ_{1}=−0.35 (underdispersion). The TWIST’s power is very close to the TLRT’s when γ_{1}=0 or γ_{1}=0.10. In these cases, the difference between the TWIST’s power and the TLRT’s vary from 0 to 0.02. When only β has an effect, the TWIST has a very low power. For example, when e^{β}=0.5, the TWIST’s power is 0.04.
We will now comment the configurations where α has a nonnull effect (Table 2a and c).
When parameters α and γ have the same directional effect, the TWIST’s power is always higher than the TLRT’s, with a difference varying from 0.18 to 0.57. As an example, when both α and γ increase the risk and e^{β}=2, the TWIST’s and TLRT’s powers are 0.95 and 0.56 respectively.
When γ has no effect, the TWIST’s power is higher than the TLRT’s, with a difference varying from 0.26 to 0.34.
When α and γ have opposite directional effect, the TWIST’s power values are close to the TLRT’s. The difference between the powers of the two tests varies from −0.10 to 0.10. As an example, when γ_{1}=0.10, e^{α}>1 and e^{β}=2, the TWIST’s and TLRT’s powers are 0.49 and 0.42 respectively.
As expected, when e^{β}=1, the power of the TWIST is inferior to the one obtained with e^{β}≠1.
The results of simulations with α and γ acting under a same genetic model other than additive are not displayed in this section but are joined in the Supplementarymaterial (Table S6). The TWIST’s power values obtained with α and γ both dominant, overdominant or recessive are different from what we described above but follow the same global trends. Moreover, according to the genetic model, the power values are different. When both α and γ are dominant or overdominant, the TWIST’s power values are higher than the ones obtained under an additive genetic model. When both α and γ are recessive, the TWIST’s power values are lower than what is obtained under an additive genetic model.
We performed simulations with the same parameters as described above but with 20% censoring (see Table S1 in the Supplementary material). As expected, the TWIST’s power values when there is 20% censoring are lower than when there is no censoring. Moreover, the difference between the TWIST’s power and the TLRT’s is lower with 20% censoring compared to no censoring.
Tables S2 to S6 in the Supplementary material display the results of simulations performed with all parameters equal to those of Table 2, unless otherwise specified. For all these configurations, the trends are similar to what we detailed above about Table 2, with changes in power values and gains. We describe briefly just below the results displayed in those additional tables.
With a minor allele frequency of 10%, as expected, the TWIST’s power is lower than when the MAF is 20%, and with a 30% MAF, it is higher (see Tables S4 and S5).
With a proportion of preimmune subjects in the reference group of 5% or 20%, the TWIST’s power is similar to what we obtain when this proportion is 10%, with slight variations according to the configuration of the parameters (see Additional file 1: Tables S4 and S5).
When the tail defect is 30%, the TWIST’s power is higher than when the tail defect is 50% (see Table S6).
We also performed simulations with α having a different genetic model than γ, those results are displayed in Tables S7 and S8 of the Supplementary material and display similar trends than what is described above, with different power values according to the genetic model.
Biotherapeutic immunogenicity study
We used our novel test to look for association between genetic variants and occurrence of ADA. The population study consists of 469 patients from the ABIRISK consortium cohort [11] suffering from autoimmune diseases and treated by biotherapies. These patients were naive for the biotherapies before the study and were followed during twelve months. The outcome was the time between the date of the first dose of biotherapy and the first detection of antidrug antibodies (ADA). Patients without ADA occurrence were censored at the date of their last clinic visit. Among the 469 subjects, 17 (3.6%) had preexisting ADA and 129 (27.5%) developed ADA during the oneyear followup.
Exploratory analyses were performed on a list of 1,697 genes linked to immunity. We selected the genetic variants with a minor allele frequency greater than 0.15, for a total of 19,745 variants. At a 10% false discovery rate level [18], five genetic variants were significantly associated with ADA occurrence with the TWIST, and none with the TLRT. Among these five variants, we highlight one from the fibroblast growth factor signaling pathway which test statistic is 28.96 (qvalue =0.093) with the TWIST and 21.87 with the TLRT (qvalue =0.34). Taken separately, the first and second components of the TWIST statistic have values of 8.02 and 20.94 respectively. It’s worth noting that if we restricted the analyses to the 452 non preimmune subjects with the same 10% false discovery rate, this genetic variant was not selected (qvalue =0.39).
This latter finding emphasizes the use of the TWIST and the inclusion of the preimmune subjects for the analyses. Figure 1 shows the survival curves for the three genotypic groups of the highlighted variant. We found that the rare variant is associated with a higher occurrence of ADA in both preimmune and non preimmune subjects.
Discussion
In this paper, we proposed a test for assessing the effect of genetic markers on drug imunogenicity. This test is based on a score statistic derived from a twopart semiparametric improper survival model which allows to test the effect of genetic markers on the timetodetection of ADA. It extends classical twopart models for semicontinuous data to the case where we can expect a mass at infinity. This model has a mechanistic interpretation of the ADA production by the Bcell clones. It allows different types of departures from equality of the twopart improper survival distributions. The biological interpretation of these departures are linked either to the proportion of preimmune subjects or to the dynamic of ADA production. In this latter case, genetic variants can be associated with changes in the number of clones or to their ADA affinities. As seen from our simulations, under the null hypothesis, our proposed test maintains a correct type I error in all the investigated configurations. Looking at the simulation results under the alternative hypotheses, we see that power gains of the proposed test are better than that of the twopart logrank test in most situations. In particular, we see that striking gains in power can be achieved by the TWIST as compared to the twopart logrank test when the parameters have the same directional effect, which is the case that we could expect the most in applications on real data. Since for complex diseases the underlying model is unknown, we have not restricted our test to one of the common genetic models. For various combinations of the classical genetic models, we obtain good performances with the proposed test. Moreover, it is worth noting that a straightforward extension of this test statistic can be done to investigate specific genetic models. Power gains decrease when the plateau or the percentage of censoring increase, which is not surprising since there are less events. Likewise, when the groups are too unbalanced, for example when the minor allele frequency is low or when the underlying genetic model is recessive, power gains decrease. The simulation results also show that our test performs well when there is a non negligible proportion of preimmune subjects.
Notwithstanding the good behavior of the proposed test, some limitations should be mentioned. In this work, we have considered a proportional effect of the studied variable on the timetodetection associated with the Bcell clones. However, if this hypothesis is not true (e.g. in case of accelerated failure time), the power of the test will be reduced. Here, we have considered a Katz family for the distribution of the number of latent Bcell clones. This modelisation provides a flexible probabilistic modeling approach but at the expense of increasing the number of parameters to be tested. Moreover, it is worth keeping in mind that this test performs best in situations where a sufficient length of followup is provided. It means that we need a window of observation long enough for the last potential event among immune reactive to occur within it. In other words, we should provide a followup adequate for detecting the presence of immunetolerant subjects in the study population. If this condition is unmet, the power of the test is reduced.
We used the proposed test for analyzing the relationship between some genetics loci and drug immunogenicity among a cohort of patient with autoimmune diseases and naive for the tested drugs. In this study, 3.6% of the patients were preimmune subjects. We found that the alternative alleles of five variants on immunity genes are significantly associated with an increased risk of developing ADA. The result for one variant from the fibroblast growth factor signaling pathway is interesting since it has an effect on both the fraction of preimmune subjects and the time of ADA occurrence. For this case, the inclusion of the preimmune subjects in the analyses allows to detect the association of the variant with ADA occurrence.
Thus, we think that our test provides a new way of assessing the effect of genetic markers on drug immunogenicity, while incorporating some biological understanding of the ADA production. Our modelbased approach can be considered as a convenient representation of complex patterns of genetic effects which allows detection of departure from the null hypothesis model.
It is worth noting that since the proposed test statistic gives equal weights to its two parts, its global power can be reduced when one of them has a smaller effect size (e.g. preimmune subjects). Following the idea introduced in the paper of Hu and Proschan [19], it could be interesting to consider a linear combination of the two components of our proposed test statistic. As in the work of Hu and Proschan, we could assign a lesser weight to the component with the smaller a priori effect than to the other. However, it requires additional work for deriving the probability density function of the optimal linear combination of the two components of the test statistic. Moreover, further works should also be done for deriving simple test statistics for testing separately each component for the non preimmune subjects.
In view of the gains in power obtained by the proposed test, its use can be recommended in many clinical situations where unwanted drug immunogenicity occurs such as in oncology and clinical immunology. It can also be used to evaluate response to vaccination for populations where preimmune and non preimmune patients exist. Moreover, the proposed test can also be extended to take other factors into account such as different treatments, by developing a stratified version with strata defined by the levels of the factors.
It can also be used to evaluate response to vaccination for populations where preimmune and non preimmune patients exist. Moreover, the proposed test can also be extended to take other factors into account such as different treatments, by developing a stratified version with strata defined by the levels of the factors.
Conclusion
In this study, we proposed a novel test statistic for assessing the effect of genetic markers on drug immunogenicity taking into account that the population under study is a mixed one. This test statistic, which is easy to implement with standard softwares, is also applicable in situations where one wants to test the equality of improper survival distributions of semicontinuous outcomes between two or more independent groups.
Availability of data and materials
The data analyzed in this study were collected in the context of the ABIRISK project by ABIRISK partners. Access to the minimal dataset underlying the findings can be obtained by interested researchers upon request to the ABIRISK Sustainability Scientific Committee by submission of an analysis plan. The analysis plan should explain the purpose of the use of the data and confirm the intention to use the data only for replication studies concerning antidrug inhibitors, since this is the limitation of the ethical permission on how this data can be used. The contact person of the ABIRISK Sustainability Scientific Committee to whom the requests should be sent is Marc Pallardy (marc.pallardy@inserm.fr).
Abbreviations
 ADA:

Antidrug antibodies
 BP:

Biopharmaceutical products
 MAF:

Minor allele frequency
 TLRT:

Twopart logrank test
 TWIST:

Twopart improper survival test
References
 1
Van de Weert M, Horn Moller E. Immunogenicity of biopharmaceuticals. New York: Springer; 2008.
 2
Pineda C, Castañeda Hernàndez G, Jacobs I, Alvarez D, Carini C. Assessing the immunogenicity of biopharmaceuticals. BioDrugs. 2016; 30(3):195–206.
 3
Shankar G, Arkin S, Cocea L, Devanarayan V, Kirshner S, Kromminga A, et al.Assessment and reporting of the clinical immunogenicity of therapeutic proteins and peptidesharmonized terminology and tactical recommendations. AAPS J. 2014; 16(4):658–73.
 4
Rup B, Pallardy M, Sikkema D, Albert T, Allez M, Broet P, et al.Standardizing terms, definitions and concepts for describing and interpreting unwanted immunogenicity of biopharmaceuticals: recommendations of the Innovative Medicines Initiative ABIRISK consortium. Clin Exp Immunol. 2015; 181(3):385–400.
 5
Xue L, Rup B. Evaluation of preexisting antibody presence as a risk factor for posttreatment pntidrug antibody induction: analysis of human clinical study data for multiple biotherapeutics. AAPS J. 2013; 15(3):893–6.
 6
Farewell V, Long D, Tom B, Yiu S, Su L. TwoPart and related regression models for longitudinal data. Annu Rev Stat Appl. 2017; 4:283–315.
 7
Lachenbruch P. Comparisons of twopart models with competitors. Stat Med. 2001; 20(8):1215–34.
 8
Maller R, Zhou X. Survival analysis with longterm survivors. New York: Wiley; 1997.
 9
Yakovlev A, Tsodikov A. Stochastic models of tumor latency and their biostatistical applications. Singapore: World Scientific; 1996.
 10
Tsodikov A, Ibrahim J, Yakovlev A. Estimating cure rates from survival data: an alternative to twocomponent mixture models. J Am Stat Assoc. 2003; 764(98):1063–78.
 11
ABIRISK: AntiBiopharmaceutical Immunization: prediction and analysis of clinical relevance to minimize the RISK. http://www.abirisk.eu/. Accessed 14 May 2019.
 12
Fleming T, Harrington D. Counting processes and survival analysis. New York: Wiley; 1991.
 13
Johnson N, Kemp A, Kotz S. Continuous univariate distributions. New York: Wiley; 2005.
 14
Cooner F, Banerjee S, Carlin BP, Sinha D. Flexible cure rate modeling under latent activation schemes. J Am Stat Assoc. 2007; 102(478):560–72.
 15
Andersen P, Borgan O, Gill R, Keiding N. Statistical models based on counting processes. New York: Wiley; 1993.
 16
Cox D. Partial likelihood. Biometrika. 2001; 62(2):269–76.
 17
Nelson W. Theory and applications of hazard plotting for censored failure data. Technometrics. 1972; 14:945–65.
 18
Dalmasso C, Broet P, Moreau T. A simple procedure for estimating the false discovery rate. Bioinformatics. 2005; 21:660–8.
 19
Hu Z, Proschan M. Twopart test of vaccine effect: Z. HU AND M. PROSCHAN. Stat Med. 2015; 34(11):1904–11. http://doi.wiley.com/10.1002/sim.6412.
Acknowledgments
This research work has been motivated by questions raised by studies conducted as part of the ABIRISK consortium (AntiBiopharmaceutical Immunization: Prediction and analysis of clinical relevance to minimize the risk, Innovative Medicines Initiative). The methodological research was conducted as part of sustainability projects of the ABIRISK consortium.
We thank the reviewers for their valuable and supportive comments.
Funding
The research of the first author was supported by a grant from University ParisSaclay (France). The funder had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Affiliations
Consortia
Contributions
JD and PB developed the original test statistic. PB coordinated the project and is JD’s PhD thesis advisor. JD, MC and PB participated in writing the original draft. JD, SH, DB and PB analyzed the data. SH, DB, MA, FD, AFH, AG, SHBA, XM, MP and PB participated in the collection of the data. MP coordinated the ABIRISK project. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
The ethics approvals for the ABIMSP01 study (EudraCT Number 201200545030) were obtained from Medical Ethics Committee of the General University Hospital in Prague (reference 125/12, Evropský grant 1.LF UKCAGEKID) for Czech Republic, Institutional committee of Heinrich Heine University, Düsseldorf, Germany (protocol reference 4451) for Düsseldorf, from Ethikkommission der Fakultät für Medizin der Technischen Universität München, München, Germany (reference 335/13) for München, from Ethikkommission Nordwest und Zentralschweiz, Basel (reference 305/13) for Switzerland, from Ethikkommission der Medizinischen Universität Innsbruck, Innsbruck (reference AN20130040 331/2.1) for Austria, from Comité Ético de Investigación Clìnica de l’Hospital Universitari Vall d’Hebrón, Barcelona (reference EPA(AG)66/2013(3866)) for Spain, from Stockholm Regional Ethics Committee, Stockholm (reference Dnr. 2013/103431/3 and Dnr. 2015/74932) for Sweden. The ethics approvals for the ABIRAP01 study (ClinicalTrials.gov, NCT02116504) were obtained from Comité de Protection des Personnes Ile de France VII (reference 13048) for France, from the Medical Ethical Committee of the Academisch Medisch Centrum, Amsterdam (reference 2013304#B20131074) for the Netherlands, from the Local Ethics Committee of AOU Careggi (reference Protocol N^{o} 2012/0035982) for Italy, from NRES Committee London, City & East (reference 14/LO/0506) for the United Kingdom. The ethics approvals for the ABIIBDP01 study were obtained from Comité de Protection des Personnes Ile de France IV (reference 2013/24) for France, from Comité d’éthique hospitalofacultaire universitaire de Liège (reference 2015/55) for Belgium. All participants gave their written informed consent.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1
Supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Duhazé, J., Caubet, M., Hässler, S. et al. Assessing the effect of genetic markers on drug immunogenicity from a mechanistic modelbased approach. BMC Med Res Methodol 20, 69 (2020). https://doi.org/10.1186/s1287402000941z
Received:
Accepted:
Published:
Keywords
 Genetic
 Drug immunogenicity
 Semicontinuous data
 Twopart improper survival model
 Semiparametric