Ridle for sparse regression with mandatory covariates with application to the genetic assessment of histologic grades of breast cancer
 Jing Zhai^{1},
 ChiuHsieh Hsu^{1} and
 Z. John Daye^{1}Email authorView ORCID ID profile
DOI: 10.1186/s128740170291y
© The Author(s) 2017
Received: 19 June 2016
Accepted: 6 January 2017
Published: 25 January 2017
Abstract
Background
Many questions in statistical genomics can be formulated in terms of variable selection of candidate biological factors for modeling a trait or quantity of interest. Often, in these applications, additional covariates describing clinical, demographical or experimental effects must be included a priori as mandatory covariates while allowing the selection of a large number of candidate or optional variables. As genomic studies routinely require mandatory covariates, it is of interest to propose principled methods of variable selection that can incorporate mandatory covariates.
Methods
In this article, we propose the ridgelasso hybrid estimator (ridle), a new penalized regression method that simultaneously estimates coefficients of mandatory covariates while allowing selection for others. The ridle provides a principled approach to mitigate effects of multicollinearity among the mandatory covariates and possible dependency between mandatory and optional variables. We provide detailed empirical and theoretical studies to evaluate our method. In addition, we develop an efficient algorithm for the ridle. Software, based on efficient Fortran code with Rlanguage wrappers, is publicly and freely available at https://sites.google.com/site/zhongyindaye/software.
Results
The ridle is useful when mandatory predictors are known to be significant due to prior knowledge or must be kept for additional analysis. Both theoretical and comprehensive simulation studies have shown that the ridle to be advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves. A microarray gene expression analysis of the histologic grades of breast cancer has identified 24 genes, in which 2 genes are selected only by the ridle among current methods and found to be associated with tumor grade.
Conclusions
In this article, we proposed the ridle as a principled sparse regression method for the selection of optional variables while incorporating mandatory ones. Results suggest that the ridle is advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves.
Keywords
Gene expression analysis Lasso Linear models Penalized regression Ridge Variable selectionBackground
Many essential problems in statistical genomics may be formulated in terms of variable selection of candidate biological factors for modeling of some trait or quantity of interest [1–3]. Often, additional covariates describing clinical, demographical, or other experimental factors must be included a priori as mandatory covariates while allowing the selection of possibly a large number of candidate or optional variables. Substantial progress has been made recently in the analysis of highdimensional data with sparse regression methods. The lasso was proposed that induces sparsity using an L1norm penalty on all coefficients [4]. With the introduction of computationally efficient algorithms [5, 6], the lasso has since become a widelyapplied variable selection method. Other methods for sparse regression include the smoothly clipped absolute deviation (SCAD) [7], adaptive lasso [8], Dantzig selector [9], etc. However, these methods were not designed for applications with mandatory covariates. An ad hoc approach is often employed where the response is regressed on mandatory covariates without penalization, as if in an ordinary least squares (OLS), while penalized regression is applied upon the optional variables, independently of the mandatory ones, to achieve variable selection. However, standard statistical principle advocates the consideration of all covariates simultaneously in order to account for complex dependencies among covariates. By penalizing coefficients disparately on some of the variables while not on others, this approach can yield both poor prediction accuracy and unreliable selection of optional variables. As mandatory covariates are routinely encountered in genomicdata analysis, it is of interest to develop a principled approach towards sparse regression with mandatory covariates. In this article, we consider the problem of efficient estimation of coefficients of mandatory covariates and simultaneous variable selection of optional variables.
Cancer arises as a disorder of the cell life cycle that leads to excessive cell proliferation and poor differentiation. Pathologists often use grading systems to measure the degree of cell differentiation in tumors [10, 11]. Tumor grade is one of the most important indicators for clinicians to guide treatment options and make prognosis for patients [12]. Histologic grade of breast cancer is representative of its aggressive potential [13]. Cancer cells with higher grades tend to be more aggressive and require quite different treatment strategies than those with lower grades. Due to the importance of tumor grade as an essential measure in clinical prognosis, treatment and of the survival of breast cancer patients, understanding genetic factors that may be predictive of tumor grade has become a desideratum of current research in breast cancers. In this article, we will propose a principled method to identify genes which may affect tumor grade while accounting for their clinical phenotypes such as age at diagnosis, p53 sequence mutation status, etc. by incorporating them as mandatory covariates.
We propose the ridgelasso hybrid estimator (ridle), a novel penalized regression procedure that can simultaneously estimate coefficients of mandatory covariates while allowing selection for others. The ridle employs the L2norm penalty to estimate mandatory coefficients and the L1norm penalty to perform variable selection on the optional set. The L2norm penalty has been successfully employed in ridge regression to efficiently estimate coefficients under a spectrum of dependency structures [14–16]. In this article, we provide theoretical, simulation, and realdata analysis to suggest the ridle as an efficient method for sparse regression with mandatory covariates. In particular, we will show that the ridle can achieve improved prediction accuracy and variable selection under commonly encountered scenarios when (1) the mandatory covariates are highly correlated among themselves or (2) the mandatory variables are correlated with the optional ones.
The rest of the article is presented as follows: “Methods” section introduces the ridle procedure, where an efficient algorithm is introduced and theoretical results are provided to suggest the efficacy of the ridle for sparse regression under mandatory predictors. “Results” section evaluates our method on simulated data. Further, we apply our method to a gene selection analysis of microarray data, where we identified more genes in breastcancer related pathways with the ridle. Additionally, the ridle is the only method that identified two genes AREG and TRPM4 from the ErbB signaling pathway and ionchannel family, respectively, which are known to be related to cancer. Further discussions are provided in “Discussion” section, and we conclude with “Conclusions” section.
Methods
The Ridle
where \(\mathcal {O}\) and \(\mathcal {M}\) are nonintersecting subsets of the indices \(\mathcal {I}=\{ 1,2,\ldots,d \}\) such that \(\mathcal {O} \cup \mathcal {M} = \mathcal {I}\). Subsets \(\mathcal {O}\) and \(\mathcal {M}\) comprise, respectively, indices of optional and mandatory variables. The ridle penalizes coefficients in \(\mathcal {O}\) by the L1norm penalty and coefficients in \(\mathcal {M}\) by the L2norm penalty. It allows variable selection, as in the lasso, for predictors in \(\mathcal {O}\) and estimation without selection, as in the ridge, for predictors in \(\mathcal {M}\). If λ _{2} is equal to 0, the ridle is equivalent to thresholding the coefficients of some predictors for variable selection and estimating the rest without penalization. As the lasso penalty is applied to optional variables while no penalization is imposed on coefficients in \(\mathcal {M}\), we call the special case of the ridle when λ _{2}=0 as the \(\mathcal {M}\)unpenalized lasso.
For further insight, we examine the ridle estimator under two special situations.
Ridle estimator in two special cases
Orthogonal design case.
where (·)^{+} denotes the positive part of the value, such that the expression is set to 0 for negative quantities. The ridle estimates equate to those of the lasso for \(j \in \mathcal {O}\) and the ridge for \(j \in \mathcal {M}\). When λ _{2}=0, the \(\mathcal {M}\)unpenalized lasso estimates equate to those of the lasso for \(j \in \mathcal {O}\) and the OLS for \(j \in \mathcal {M}\). It is clear that, when the design matrix is orthogonal, the L1norm and L2norm penalties work independently to penalize coefficients with indices in \(\mathcal {O}\) and \(\mathcal {M}\), respectively. The situation is more involved when predictors are correlated.
Twopredictor case
In Fig. 1 a, we obtain the lasso solution as an ellipse hits a corner of the lasso penalty contour, setting β _{1} to 0. In Fig. 1 c, we see that the ridge penalty contour is circular, and an ellipse hitting the penalty contour gives nonzero estimates. The ridle penalty is described in Fig. 1 b. It has both the characteristics of the lasso and ridge with an oval shape along the horizontal and sharp corners on the vertical axis. The ridle solution occurs when an ellipse centered on the OLS estimates hits a sharp corner on the vertical axis, yielding β _{1}=0 and a nonzero β _{2}. Thus, we see that the ridle may provide sparse solutions for coefficients in \(\mathcal {O}\) while preserving nonsparsity for coefficients in \(\mathcal {M}\).
where \(s_{1} = \text {sign} (\hat {\beta }_{1}(\text {ols}))\), θ _{1}=λ _{1}(n+λ _{2})/(2n(n+λ _{2}−n ρ ^{2})), and θ _{2}=n(1−ρ ^{2})/(n+λ _{2}−n ρ ^{2}). We see that the coefficient \(\hat {\beta }_{1}\) can be thresholded to 0 with increasing θ _{1}(λ _{1},λ _{2},ρ) and θ _{2}(λ _{2},ρ) functions to increase (temper) the thresholding of \(\hat {\beta }_{1}\) when \(\rho s_{1} \hat {\beta }_{2}(\text {ols})\) is negative (positive). On the other hand, \(\hat {\beta }_{2}\) converges to a weighted average of \(\hat {\beta }_{1}(\text {ols})\) and \(\hat {\beta }_{2}(\text {ols})\) without necessarily thesholding it to 0 as θ _{1} increases to \(\hat {\beta }_{1}(\text {ols}) + \rho \lambda _{2} \theta _{2} s_{1} \hat {\beta }_{2}(\text {ols})/(nn\rho ^{2})\).
In the special case when λ _{2}=0, the ridle is reduced to the \(\mathcal {M}\)unpenalized lasso, with estimates \(\mathcal {M}\)unpenalized lasso, with estimates \(\hat {\beta }_{1}(\mathcal {M}\text {unpenalized lasso}) = s_{1} (\hat {\beta }_{1}(\text {ols})  \theta _{1})^{+}\), \(\hat {\beta }_{2} (\mathcal {M}\text {unpenalized lasso}) = \hat {\beta }_{2}(\text {ols}) + s_{1} \rho \theta _{1}\) if \(\theta _{1} < \hat {\beta }_{1}(\text {ols})\), and \(\hat {\beta }_{2}(\mathcal {M}\text {unpenalized lasso}) = \hat {\beta }_{2}(\text {ols}) + \rho \hat {\beta }_{1}(\text {ols})\) otherwise. Under multicollinearity when ρ is large, the OLS estimates are known to have large variability. In this case, the ridge is often employed to improve prediction accuracy by regulating variances. Compared with the ridle, the \(\mathcal {M}\)unpenalized lasso that imposes no penalization on mandatory coefficients can be less effective in tempering the effects of multicollinearity. For example, when ρ=1 and θ _{1} is large, the \(\mathcal {M}\)unpenalized lasso estimate for β _{2} is \(\hat {\beta }_{2}(\text {ols}) + \hat {\beta }_{1}(\text {ols})\), such that the \(\mathcal {M}\)unpenalized lasso can have larger prediction error than the OLS.
The lasso has the solution \(\hat {\beta }_{j}(\text {lasso}) = s_{j} (\hat {\beta }_{j}(\text {ols})  \gamma)^{+}\) for j=1,2 and does not involve the correlation ρ when d=2 [4]. In contrast, ridge coefficients tend to be averaged with increasing correlation. This property helps ridge to reduce variances of its estimates and improve prediction accuracy when data is multicollinear [16]. The ridle estimates \((\hat {\beta }_{1},\hat {\beta }_{2})\) are also defined in terms of weighted averages of \(\hat {\beta }_{1}(\text {ols})\) and \(\hat {\beta }_{2}(\text {ols})\) according to correlation ρ. In the following, we will show via theoretical studies how this property can improve variable selection for ridle.
Theoretical properties
In this section, we provide theoretical properties of the ridle estimator. These results are useful in providing a window to understanding the proposed method and a guide as to how the methods might perform in practice. Here, we use the signconsistency approach [17] for theoretical derivations, which can provide results that are easy to interpret and relate to applications. More involved theoretical approaches, such as the asymptotic and nonasympotic oracle properties [7, 18], often rely on complex conditions that are difficult to interpret. Proofs for theoretical results in this section are provided in Additional file 1.
Asymptotic normality of the Ridle
Theorem 1
and W∼N(0,σ ^{2} C).
Theorem 1 shows that the coefficients of mandatory covariates can contribute to the biasness of the ridle estimates. If the coefficients of variables in \(\mathcal {M}\) are relatively large, then a small c _{2} is required to keep the bias low. On the other hand, when coefficients of variables in \(\mathcal {M}\) are small, a wider spectrum of values of c _{2} can be chosen to improve prediction accuracy. Hence, we expect the ridle to perform the best when coefficients of mandatory variables tend to be small.
Variable selection consistency of the Ridle.
Sign consistency, as a stronger condition, directly implies variable selection consistency.
With (5)(7), we give the following conditions for sign consistency of the ridle estimator.
where \(\mathbf {D}^{n} =(\mathbf {C}^{n}_{21}\mathbf {C}^{n}_{23}(\tilde {\mathbf {C}}^{n}_{33})^{1}\mathbf {C}^{n}_{31}) (\mathbf {C}^{n}_{11}\mathbf {C}^{n}_{13}(\tilde {\mathbf {C}}^{n}_{33})^{1}\mathbf {C}^{n}_{31})^{1}\).
Theorem 2
Under (5)(7), \(\hat {\mathbf {\beta }}(\lambda _{1},\lambda _{2})\) is sign consistent if condition (10) holds for λ _{1},λ _{2}>0 such that λ _{1}/n→0, \(\lambda _{1}/\sqrt {n} \to \infty \), and λ _{2}/λ _{1}→c<∞.
Theorem 3
Under (5)(7), \(\hat {\mathbf {\beta }}(\lambda _{1},\lambda _{2})\) is sign consistent only if condition (11) holds for λ _{1},λ _{2}>0 such that λ _{2}/n→0.
Remark 1
Let \(\mathbf {C}^{n}_{12}=\mathbf {C}^{n}_{23}=\mathbf {0}\). Then conditions (10) and (11) are satisfied with lefthand sides equal to 0. Thus, the ridle estimator is sign consistent when predictors with nonzero coefficients are unrelated with predictors with zero coefficients.
Remark 2
Remark 3
Remark 4
In this case, if \(\mathbf {D}^{n}_{1} \text {sign}(\mathbf {\beta }^{0}_{(1)})  \ge 1\), sign consistency conditions for the \(\mathcal {M}\)unpenalized lasso in (14) are violated, whereas sign consistency conditions for the ridle may still be satisfied with suitably chosen λ _{2}.
Remark 5
When the mandatory covariates are irrelevant, \(\mathbf {\beta }^{0}_{(3)}=\mathbf {0}\), the offsetting terms in (10) and (11) for ridle sign consistency would vanish. Indeed, with λ _{2}/n→0, ridle sign consistency conditions are equivalent to those of both the lasso and \(\mathcal {M}\)unpenalized lasso. However, this does not mean that the methods will perform similarly under finite samples. We will examine their finitesample performances in the Analysis of Simulated Data in the “Results” section.
Efficient algorithm
We provide an efficient algorithm for computing the ridle. Programming code, written in Fortran, and its Rlanguage wrapper for the algorithm described in this section are freely available online at http://sites.google.com/site/zhongyindaye/software.
where \(s_{j} = \text {sign}(\mathbf {x}^{T}_{j} (\mathbf {y}  \sum _{k \neq j} \mathbf {x}_{k} \beta _{k}))\). Maximum value for λ _{1} is \(\lambda _{1}^{max} = 2 \max _{j \in \mathcal {O}}  \mathbf {y}  \mathbf {X}_{\mathcal {M}} \hat {\mathbf {\beta }}_{\mathcal {M},0} \), where \(\hat {\mathbf {\beta }}_{\mathcal {M},0} = (\mathbf {X}_{\mathcal {M}}^{T} \mathbf {X}_{\mathcal {M}} + \lambda _{2} \mathbf {I})^{1} \mathbf {X}^{T}_{\mathcal {M}} \mathbf {y}\) are initial estimates for coefficients of mandatory covariates. The matrix inverse in (15) can be computed efficiently by taking the inverse of individual eigenvalues added to λ _{2} after an initial singular value decomposition of \(\mathbf {X}_{\mathcal {M}}^{T} \mathbf {X}_{\mathcal {M}}\).
Results
Analysis of simulated data
We evaluate the performances of the ridle via simulation studies. We examine effects of having different magnitudes of coefficients, correlations between mandatory and irrelevant predictors, and degrees of multicollinearity among mandatory covariates. We compare the ridle to the ridge, lasso, elastic net, and the lasso and elastic net without penalization on the mandatory covariates. We use the R package glmnet 2.0 to compute for the lasso and elastic net, where the penalization on mandatory covariates is specified using the penalty.factor option.
Simulation example 1: effect of signal strengths
Method  rpe  gmeasure  Sensitivity  Specificity  

β _{0}=0.5  Ridge  1.008 (0.009)  
Lasso  1.004 (0.018)  0.582 (0.009)  0.350 (0.018)  0.957 (0.006)  
Elastic net  0.923 (0.020)  0.676 (0.007)  0.600 (0.041)  0.848 (0.023)  
\(\mathcal {M}\)unpenalized lasso  0.675 (0.028)  1.000 (0.000)  1.000 (0.000)  1.000 (0.000)  
\(\mathcal {M}\)unpenalized elastic net  0.697 (0.026)  1.000 (0.001)  1.000 (0.000)  1.000 (0.002)  
Ridle  0.281 (0.016)  0.998 (0.001)  1.000 (0.000)  0.996 (0.002)  
β _{0}=1.5  Ridge  6.549 (0.056)  
Lasso  3.300 (0.083)  0.839 (0.005)  0.750 (0.017)  0.926 (0.003)  
elastic net  3.230 (0.118)  0.853 (0.004)  0.900 (0.008)  0.850 (0.005)  
\(\mathcal {M}\)unpenalized lasso  0.691 (0.023)  1.000 (0.000)  1.000 (0.000)  1.000 (0.000)  
\(\mathcal {M}\)unpenalized elastic net  0.701(0.028)  1.000 (0.001)  1.000 (0.000)  1.000 (0.001)  
Ridle  0.473 (0.014)  0.998 (0.001)  1.000 (0.000)  0.996 (0.002)  
β _{0}=3  Ridge  24.559 (0.317)  
Lasso  8.074 (0.433)  0.908 (0.005)  0.900 (0.013)  0.935 (0.002)  
Elastic net  6.735 (0.339)  0.903 (0.002)  0.950 (0.013)  0.852 (0.003)  
\(\mathcal {M}\)unpenalized lasso  0.676 (0.032)  1.000 (0.000)  1.000 (0.000)  1.000 (0.000)  
\(\mathcal {M}\)unpenalized elastic net  0.725 (0.030)  1.000 (0.000)  1.000 (0.000)  1.000 (0.001)  
Ridle  0.605 (0.025)  0.998 (0.001)  1.000 (0.000)  0.996 (0.002) 
Simulation example 2: effect of correlation between mandatory and irrelevant predictors
Method  rpe  gmeasure  Sensitivity (\(\mathcal {M}\))  Sensitivity (\(\mathcal {O}\))  Specificity (\(\mathcal {O}\))  

ρ _{0}=0.25  Ridge  1.671 (0.012)  
Lasso  1.911 (0.022)  0.383 (0.034)  0.100 (0.032)  0.200 (0.028)  0.975 (0.008)  
Elastic net  1.744 (0.019)  0.585 (0.015)  0.400 (0.054)  0.600 (0.050)  0.835 (0.036)  
\(\mathcal {M}\)unpenalized lasso  1.741 (0.028)  0.742 (0.012)  1.000 (0.000)  0.200 (0.037)  0.938 (0.003)  
\(\mathcal {M}\)unpenalized elastic net  1.657 (0.017)  0.757 (0.008)  1.000 (0.000)  0.500 (0.064)  0.833 (0.022)  
Ridle  1.492 (0.031)  0.773 (0.006)  1.000 (0.000)  0.200 (0.048)  0.931 (0.006)  
ρ _{0}=0.5  Ridge  1.807 (0.014)  
Lasso  2.045 (0.035)  0.571 (0.013)  0.300 (0.046)  0.400 (0.039)  0.925 (0.007)  
Elastic net  1.773 (0.034)  0.667 (0.008)  0.600 (0.014)  0.800 (0.048)  0.756 (0.020)  
\(\mathcal {M}\)unpenalized lasso  1.922 (0.044)  0.794 (0.003)  1.000 (0.000)  0.400 (0.047)  0.929 (0.004)  
\(\mathcal {M}\)unpenalized elastic net  1.729 (0.040)  0.796 (0.007)  1.000 (0.000)  0.700 (0.048)  0.785 (0.022)  
Ridle  1.438 (0.057)  0.852 (0.006)  1.000 (0.000)  0.600 (0.049)  0.900 (0.004)  
ρ _{0}=0.75  Ridge  1.564 (0.022)  
Lasso  1.365 (0.029)  0.684 (0.008)  0.400 (0.032)  0.600 (0.012)  0.900 (0.003)  
Elastic net  1.237 (0.030)  0.745 (0.005)  0.700 (0.048)  0.900 (0.011)  0.775 (0.014)  
\(\mathcal {M}\)unpenalized lasso  1.423 (0.037)  0.839 (0.005)  1.000 (0.000)  0.700 (0.026)  0.904 (0.006)  
\(\mathcal {M}\)unpenalized elastic net  1.310 (0.041)  0.847 (0.005)  1.000 (0.000)  0.800 (0.012)  0.840 (0.008)  
Ridle  0.886 (0.029)  0.875 (0.003)  1.000 (0.000)  0.700 (0.038)  0.908 (0.003) 
Simulation example 3: effect of multicollinearity among mandatory covariates
Method  rpe  gmeasure  Sensitivity (\(\mathcal {M}\))  Sensitivity (\(\mathcal {O}\))  Specificity (\(\mathcal {O}\))  

ρ=0.75  Ridge  6.353 (0.022)  
Lasso  4.649 (0.167)  0.802 (0.011)  0.800 (0.000)  0.700 (0.048)  0.908 (0.004)  
Elastic net  4.410 (0.128)  0.804 (0.005)  1.000 (0.009)  0.700 (0.006)  0.858 (0.006)  
\(\mathcal {M}\)unpenalized lasso  4.776 (0.260)  0.829 (0.005)  1.000 (0.000)  0.700 (0.031)  0.902 (0.007)  
\(\mathcal {M}\)unpenalized elastic net  5.402 (0.190)  0.823 (0.006)  1.000 (0.000)  0.700 (0.013)  0.871 (0.009)  
Ridle  2.699 (0.152)  0.893 (0.007)  1.000 (0.000)  0.900 (0.048)  0.904 (0.004)  
ρ=0.9  Ridge  6.270 (0.026)  
Lasso  4.914 (0.148)  0.784 (0.010)  0.600 (0.089)  0.700 (0.036)  0.908 (0.004)  
Elastic net  4.336 (0.135)  0.816 (0.005)  0.800 (0.092)  0.700 (0.018)  0.867 (0.008)  
\(\mathcal {M}\)unpenalized lasso  6.992 (0.337)  0.828 (0.008)  1.000 (0.000)  0.700 (0.031)  0.902 (0.006)  
\(\mathcal {M}\)unpenalized elastic net  7.245 (0.237)  0.827 (0.005)  1.000 (0.000)  0.700 (0.045)  0.860 (0.011)  
Ridle  3.000 (0.214)  0.890 (0.006)  1.000 (0.000)  0.800 (0.045)  0.900 (0.004)  
ρ=0.99  Ridge  6.231 (0.031)  
Lasso  7.322 (0.200)  0.745 (0.005)  0.400 (0.000)  0.700 (0.000)  0.913 (0.003)  
Elastic net  5.003 (0.155)  0.804 (0.006)  0.800 (0.049)  0.700 (0.019)  0.883 (0.006)  
\(\mathcal {M}\)unpenalized lasso  36.214 (2.064)  0.824 (0.006)  1.000 (0.000)  0.700 (0.046)  0.904 (0.005)  
\(\mathcal {M}\)unpenalized elastic net  33.583 (2.197)  0.830 (0.004)  1.000 (0.010)  0.700 (0.045)  0.867 (0.010)  
Ridle  4.193 (0.343)  0.890 (0.005)  1.000 (0.000)  0.800 (0.029)  0.904 (0.004) 
Simulation example 4: mandatory covariates are irrelevant
Method  rpe  gmeasure  Specificity (\(\mathcal {M}\))  Sensitivity (\(\mathcal {O}\))  Specificity (\(\mathcal {O}\))  

ρ _{0}=0.25  Ridge  1.671 (0.012)  
Lasso  1.911 (0.022)  0.383 (0.034)  1.000 (0.000)  0.200 (0.028)  0.975 (0.008)  
Elastic net  1.744 (0.019)  0.585 (0.015)  0.600 (0.053)  0.600 (0.050)  0.835 (0.036)  
\(\mathcal {M}\)unpenalized lasso  2.357 (0.032)  0.215 (0.103)  0.000 (0.000)  0.050 (0.024)  0.995 (0.003)  
\(\mathcal {M}\)unpenalized elastic net  2.210 (0.034)  0.308 (0.054)  0.000 (0.000)  0.525 (0.065)  0.732 (0.057)  
Ridle  1.854 (0.012)  0.309 (0.029)  0.000 (0.000)  0.100 (0.024)  0.982 (0.005)  
ρ _{0}=0.5  Ridge  1.807 (0.014)  
Lasso  2.045 (0.035)  0.571 (0.013)  0.800 (0.006)  0.400 (0.039)  0.925 (0.007)  
Elastic net  1.773 (0.034)  0.667 (0.008)  0.500 (0.048)  0.800 (0.048)  0.756 (0.020)  
\(\mathcal {M}\)unpenalized lasso  2.242 (0.023)  0.299 (0.035)  0.000 (0.000)  0.100 (0.021)  0.982 (0.004)  
\(\mathcal {M}\)unpenalized elastic net  2.080 (0.028)  0.305 (0.094)  0.000 (0.000)  0.550 (0.072)  0.700 (0.079)  
Ridle  1.801 (0.039)  0.528 (0.032)  0.000 (0.000)  0.300 (0.038)  0.943 (0.005)  
ρ _{0}=0.75  Ridge  1.564 (0.022)  
Lasso  1.365 (0.029)  0.684 (0.008)  0.700 (0.041)  0.600 (0.012)  0.900 (0.003)  
Elastic net  1.237 (0.030)  0.745 (0.005)  0.300 (0.046)  0.900 (0.011)  0.775 (0.014)  
\(\mathcal {M}\)unpenalized lasso  1.747 (0.043)  0.428 (0.003)  0.000 (0.000)  0.200 (0.000)  0.964 (0.003)  
\(\mathcal {M}\)unpenalized elastic net  1.662 (0.043)  0.514 (0.016)  0.000 (0.000)  0.350 (0.023)  0.900 (0.015)  
Ridle  1.253 (0.042)  0.596 (0.017)  0.000 (0.000)  0.400 (0.026)  0.945 (0.003) 
Example 1 (Effect of signal strengths)
This example has β _{ j }=β _{0} for j∈{1,…,10,21,…,30} and β _{ j }=0 otherwise. Predictors are generated from X∼N(0,Σ) where Σ _{ ij }=0.5^{i−j}. σ=3. We assume the mandatory covariates to be comprised of the relevant variables so that \(\mathcal {M}= \{1,\ldots,10,21,\ldots,30\}\).
Table 1 displays prediction accuracy and variable selection performances for this example. First of all, by utilizing a priori information on mandatory covariates, the ridle has significantly smaller rpe’s than those of the ridge, lasso and elastic net with or without penalization on mandantory covariates. Additionally, the ridle has larger gmeasure than those of the lasso and elastic net and similar gmeasure with the mandantoryunpenalized lasso and elasticnet method. Sensitivity for the lasso and elastic net decreases dramatically as the signal strength weakens or β _{0} becomes smaller. On the other hand, specificity for the lasso decreases while increasing for elastic net when β _{0} becomes larger. Furthermore, the lasso and elastic net without penalization on mandantory variables outperforms the lasso and elastic net with penaliztion on the mandantory variables in terms of both prediction accuracy and variable selection. These suggest that, even though the elastic net does a better job than the lasso in terms of prediction accuracy, both methods may not be able to distinguish well between mandatory and irrelevant variables, and incorporating a priori knowledge on mandatory covariates can yield significant improvements.
Example 2 (Effect of correlation between mandatory and irrelevant predictors)
In this example, we have β _{ j }=2 for j∈{2k:k=1,…,10}, β _{ j }=1.5 for j∈{2k:k=11,…,20}, and β _{ j }=0 otherwise. Predictors are generated from X∼N(0,Σ) where each element \(\Sigma _{ij}= \rho _{0}^{ij}\). Thus, relevant predictors are interspersed with irrelevant ones, to which they are correlated. Further, we assume \(\mathcal {M}= \{ 2k: k=11, \ldots, 20 \}\) and σ=6. \(\Sigma _{ij} = \rho _{0}^{ij}\) presents an autocorrelated dependence structure, such that a variable x _{ j } has a correlation of ρ _{0} with its immediate neighbors x _{ j−1} and x _{ j+1} for 1<j<p. When ρ _{0} is large, each variable is highly correlated with its immediate neighbors, resulting in multicollinearity.
Table 2 presents prediction accuracy and variable selection performances for this example. The ridle performs the best in terms of rpe’s. When ρ _{0} is large at 0.75, mandatory covariates are strongly correlated with some of the optional variables, and the \(\mathcal {M}\)unpenalized lasso performs the worst in terms of prediction accuracy, except that of the ridge. This corroborates comments in TwoPredictor Case of “Methods” section that suggest the \(\mathcal {M}\)unpenalized lasso can have large prediction errors under multicollinearity. Further, the ridle performs the best in terms of gmeasures for overall variable selection in all scenarios.
Example 3 (Effect of multicollinearity among mandatory covariates)
Here, we have β _{ j }=3 for j∈{1,…,5}, β _{ j }=1.5 for j∈{6,…,10}, β _{ j }=2 for j∈{16,…,20}, and β _{ j }=0 otherwise. We set σ=3 and assume \(\mathcal {M}= \{16, \ldots, 20\}\). Let Z∼N(0,1) and ε _{ x }∼N(0,1). We generate predictors as \(\mathbf {x}_{j} = Z + \sqrt {(1\rho)/\rho } \, \mathbf {\epsilon }\) for \(j \in \mathcal {M}\) and x _{ j }∼N(0,1) otherwise. This creates correlations of ρ among the mandatory covariates.
In Table 3, we see that sensitivity (\(\mathcal {M}\)) decreases for the lasso and elastic net as ρ increases. Additionally, the lasso and elastic net without penalization on mandantory variables have identical sensitivity (\(\mathcal {M}\)) with the ridle. Furthermore, prediction error for the lasso without penalization on mandatory covariates increases dramatically as ρ increases, whereas the ridle has the lowest rpe’s. This corroborates Remark 3 of Variable Selection Consistency of the Ridle in “Methods” section, which suggests that the ridle may outperform the lasso when mandatory variables are highly correlated among themselves.
Example 4 (Mandatory covariates are irrelevant)
We repeat the simulation setting from example 2, but with the mandatory covariates defined as \(\mathcal {M}= \{ 2k1: k=1, \ldots, 10 \}\). In this case, the mandatory covariates are irrelevant.
Table 4 presents prediction accuracy and variable selection performances for this scenario. The ridle underperforms the elastic net but outperforms all other variable selection methods in terms of prediction accuracy. Indeed, the ridle has significantly smaller rpe’s compared with the \(\mathcal {M}\)unpenalized lasso and \(\mathcal {M}\)unpenalized elastic net. Moreover, the ridle underperforms both the lasso and elastic net in terms of gmeasures for overall variable selection. However, ridle outperforms the \(\mathcal {M}\)unpenalized lasso and \(\mathcal {M}\)unpenalized elastic net at ρ _{0}=0.5 and ρ _{0}=0.75, when the irrelevant mandatory covariates are moderately and highly correlated, respectively, with some of the relevant optional variables. These suggest that the ridle, although not designed to exclude mandatory covariates when they are irrelevant, can be more advantageous than related methods that include mandatory covariates, as the ridle penalizes coefficients of irrelevant mandatory covariates towards, although not equal to, 0 with the ridge penalty.
Gene expression analysis on histologic grades of breast cancer
Histologic grades are an important determinant of the aggressive potential of breast cancers and are of practical importance in the assessment and choice of treatment options. In this section, we apply our proposed method on a microarray gene expression dataset to determine genes that may be predictive of breast tumor histologic grade [22]. In this experiment, 251 frozen tumor tissure were collected from primary breast cancer patients and more than 12,000 genes were assayed on 251 subjects. We removed 2 subjects with missing outcomes and performed our analysis with the remaining 249 observations. Clinicopathological variables, such as ER status, PgR status, age and tumor size, measured at diagnosis, were obtained from patient records. Histologic grades are based on the widely used Nottingham Histologic Score system for prognosis of breast cancer [23]. There are three factors that pathologists consider in this scoring system: cell differentiation, nuclear features and mitotic activity [24]. Considerations of these factors allow the Nottingham Prognostic Index (NPI) to provide comprehensive prognosis of breast cancers. The three factors are each assigned a score from 1–3 based on clinical observations. A tumor is assigned a score of 1, 2, or 3 for cell differentiation if >75%, 10%75%, or <10% of tumor area form glandular structures, respectively. A tumor has a score of 1, 2, or 3 for nuclear features if nuclei have little increase in size, larger than normal breast epithelial cells, or prominent nucleoli with occasionally very large sizes, respectively. Further, breast tumors have scores of 1, 2, or 3 for mitotic activity if ≤7, 8–14, or ≥15 mitoses per 10 high power microscopic fields are observed, respectively. Overall tumor grades are obtained by summing the scores for the three factors. Breast tumors with total scores of 3–5, 6–7, and 8–9 are assigned with tumor grades 1 (low), 2 (intermediate), and 3 (high), respectively, that represent the aggressive potential of breast tumors. The higher the grade is, the more likely it will spread or become aggressive. This dataset is available at the NCBI Gene Expression Omnibus (GEO) repository with GEO accession: gse3294. We focused our analysis on 430 genes from several wellknown cancerrelated pathways: PI3K [25, 26], p53 signaling [27–29], VEGF [30, 31], Hedgehog signaling [32, 33], ErbB signaling [34, 35], Ras signaling [36, 37] and Ionchannel family [38, 39].
Significant genes are selected as predictors of breast tumor grade along the 7 pathways by utilizing a sparse regression approach [40, 41]. In this strategy, tumor grade is regressed upon both the 4 clinicopathological covariates (ER status, PgR status, age and tumor size) and 430 gene expression levels, and significant predictor to tumor grade based on clinical covariates and genes are identified if they are retained in sparse regression analysis. We applied the ridle to perform variable selection on gene expression levels while conditioning on the 4 clinicopathological variables that we incorporated as mandatory covariates. We further compared our results with those from the ridge, lasso, elastic net, and lasso and elastic net without penalization on the 4 mandatory covariates.
Gene expression analysis on histologic grades of breast cancer
No. selected \(\mathcal {M}\)  No. selected \(\mathcal {O}\)  MSE  

Ridge  4  430  0.487 
Lasso  2  19  0.260 
Elastic net  2  14  0.286 
\(\mathcal {M}\)unpenalized lasso  4  21  0.257 
\(\mathcal {M}\)unpenalized elastic net  4  7  0.296 
Ridle  4  24  0.239 
Cells are continuously exposed to stimuli from paracrine and endocrine factors. It is essential that the extracellular signals are interpreted by cell correctly in order to facilitate proper proliferative response. The ErbB family belongs to receptors of the tyrosine kinase family and plays pivotal roles in this process [34]. Members of the ErbB signaling pathway have been suggested as potential therapeutic targets [42]. Initial studies have also suggested that expression levels of AREG (amphiregulin) are associated with larger and more aggressive tumors through cell proliferation [43, 44]. Only the ridle identified AREG as predictive of histologic grades of breast cancers.
The other gene selected only by the ridle is TRPM4 from the ionchannel family. Researches over the past few years have shown that ion channels are involved in the progression and pathology of a myriad of human cancers [39, 45, 46]. In addition, ion channels are known to play critical roles in gene expression, hormone secretion, cell volume regulation, and cell proliferation [47, 48]. The expression levels of ionchannel genes, including TRPM4, have been found to be predictive of and significantly associated with tumor progression [38].
Breast cancer is known to be highly correlated with hormone secretion. Breast tumors that are ER or PgRpositive are much more likely to respond to hormone therapy than tumors that are negative. Many of these may not be related to histologic grades of breast cancer. For example, in a previous study, twentyfour ionchannel genes were found to be differentially expressed between ERnegative and ERpositive tumors [38]. However, in our analysis, we only identified 1 gene, AREG, from the ionchannel family to be predictive of histologic grades of cancer. Thus, many of the 430 breastcancer related genes may not be predictive of histologic grades but are expected to be highly correlated with the mandatory covariates, i.e. ER and PgR statuses. As suggested by both theoretical and simulation studies, the ridle can be advantageous when mandatory variables are correlated with the irrelevant optional ones. Results from the gene expression analysis further validate and demonstrate the performances of the ridle under this commonly seen scenario.
Discussion
In this article, we proposed the ridle for sparse regression with mandatory covariates. We provided both theoretical and simulation studies that demonstrated the efficacy of our method. In particular, our results suggest that the ridle may outperform the lasso and elastic net when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves. The ridle can also improve upon performances of the lasso and elastic net when mandatory covariates have small or moderate effects.
We employed the L1norm penalty to induce sparsity on the optional set. This is chosen for its simplicity, computational ease, and successes in a myriad of applications; for example, L1norm penalized regressions have been successfully applied in largescale genomewide association [3] and eQTL data studies [49]. However, other sparse regularization methods, such as the SCAD [7], adaptive lasso [8], Dantzig selector [9], etc. can also be utilized in place of the L1norm penalty in (1).
The ridle is related to the elastic net [41] that also employs both the L1norm and L2norm penalties. However, the elastic net applies both penalties upon all coefficients of the optional set, whereas the ridle applies the L1norm to coefficients of the optional set and the L2norm to coefficients of the mandatory set for simultaneous estimation of mandatory covariates while allowing selection for others.
In this article, we applied our method in an interesting application to gene expression analysis where we identified more genes related to tumor grade while incorporating clinicopathological variables as mandatory covariates. In addition, the ridle can be applied in a myriad of other genomic studies where mandatory covariates are routinely required, such as when clinical, demographical, or experimental effects have to be incorporated in regression analysis of genomic data sets.
Conclusions
In this article, we proposed the ridle as a principled sparse regression method for the selection of optional variables while incorporating mandatory ones. Mandatory covariates are routinely encountered in the analysis of geneticbiomedical data. For example, additional covariates describing clinical, demographical or experimental effects need to be included a priori without subjecting them to variable selection. Results suggest that the ridle may outperform current methods when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves.
Abbreviations
 NPI:

Nottingham Prognostic Index
 OLS:

ordinary least squares
 Ridle:

Ridgelasso hybrid estimator
 SCAD:

smoothly clipped absolute deviation
Declarations
Acknowledgements
We would like to thank Gregor Stiglic and Yuan Jiang for their generous comments and improvements made in the article based on their suggestions.
Funding
None.
Availability of data and materials
Dataset used for gene expression analysis on histologic grades of breast cancer in “Results” section can be accessed at the NCBI Gene Expression Omnibus (GEO) repository with GEO accession: gse3294.
Authors’ contributions
JZ performed genetic assessment of histologic grades of breast cancer. ZJD planned while JZ performed the simulation studies. ZJD developed the regression method and implemented algorithm. ZJD developed and JZ validated the theoretical results. CHH and ZJD supervised the project. JZ, CHH, and ZJD all participated in project development and writing the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Liu ZQ, et al. Gene and pathway identification with l(p) penalized bayesian logistic regression. BMC Bioinformatics. 2008; 412:1–19.View ArticleGoogle Scholar
 Logsdon BA, Mezey J. Gene expression network reconstruction by convex feature selection when incorporating genetic perturbations. PLoS Comput Biol. 2010; 6:1001014.View ArticleGoogle Scholar
 Wu TT, et al.Genomewide association analysis by lasso penalized logistic regression. Bioinformatics. 2009; 25:714–21.View ArticlePubMedPubMed CentralGoogle Scholar
 Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1996; 58:267–88.Google Scholar
 Efron B, et al. Least angle regression. Ann Stat. 2004; 32:407–99.View ArticleGoogle Scholar
 Friedman J, et al.Pathwise coordinate optimization. Ann Appl Stat. 2007; 1:302–32.View ArticleGoogle Scholar
 Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001; 96:1348–60.View ArticleGoogle Scholar
 Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006; 101:1418–29.View ArticleGoogle Scholar
 Candes E, Tao T. The dantzig selector: Statistical estimation when p is much larger than n. Ann Stat. 2007; 35:2313–51.View ArticleGoogle Scholar
 Trojani M, Contesso G, Coindre J, Rouesse J, Bui N, De Mascarel A, Goussot J, David M, Bonichon F, Lagarde C. Softtissue sarcomas of adults; study of pathological prognostic variables and definition of a histopathological grading system. Int J Cancer. 1984; 33(1):37–42.View ArticlePubMedGoogle Scholar
 BenPorath I, Thomson MW, Carey VJ, Ge R, Bell GW, Regev A, Weinberg RA. An embryonic stem celllike gene expression signature in poorly differentiated aggressive human tumors. Nat Genet. 2008; 40(5):499–507.View ArticlePubMedPubMed CentralGoogle Scholar
 Gleason DF, Mellinger GT. Prediction of prognosis for prostatic adenocarcinoma by combined histological grading and clinical staging. J Urol. 1974; 111(1):58–64.PubMedGoogle Scholar
 Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, HaibeKains B, et al.Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Nat Cancer Inst. 2006; 98(4):262–72.View ArticlePubMedGoogle Scholar
 Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970; 12:55–67.View ArticleGoogle Scholar
 Hoerl AE, Kennard RW. Ridge regression: Applications to nonorthogonal problems. Technometrics. 1970; 12:69–82.View ArticleGoogle Scholar
 Marquardt DW, Snee RD. Ridge regression in practice. Am Stat. 1975; 29:3–20.Google Scholar
 Zhao P, Yu B. On model selection consistency of lasso. J Mach Learn Res. 2006; 7:2541–67.Google Scholar
 Stadler N, Buhlmann P, van de Geer S. l1penalization for mixture regression models. Test. 2010; 19:209–56.View ArticleGoogle Scholar
 Tseng P. Coordinate ascent for maximizing nondifferentiable concave functions. Technical Report LIDSP 1840, Massachusetts Institute of Technology, Laboratory for Information and Decision Systems. 1988.
 Tseng P. Convergence of block coordinate descent method for nondifferentiable maximation. J Optimiz Theory App. 2001; 109:474–94.View ArticleGoogle Scholar
 Friedman J, et al.Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010; 33(1):1–22.View ArticlePubMedPubMed CentralGoogle Scholar
 Van De Vijver MJ, He YD, van’t Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, et al.A geneexpression signature as a predictor of survival in breast cancer. N Engl J Med. 2002; 347(25):1999–2009.View ArticlePubMedGoogle Scholar
 Bloom H, Richardson W. Histological grading and prognosis in breast cancer: a study of 1409 cases of which 359 have been followed for 15 years. Br J Cancer. 1957; 11(3):359.View ArticlePubMedPubMed CentralGoogle Scholar
 Galea MH, Blamey RW, Elston CE, Ellis IO. The nottingham prognostic index in primary breast cancer. Breast Cancer Res Treat. 1992; 22(3):207–19.View ArticlePubMedGoogle Scholar
 Engelman JA. Targeting pi3k signalling in cancer: opportunities, challenges and limitations. Nat Rev Cancer. 2009; 9(8):550–62.View ArticlePubMedGoogle Scholar
 Berns K, Horlings HM, Hennessy BT, Madiredjo M, Hijmans EM, Beelen K, Linn SC, GonzalezAngulo AM, StemkeHale K, Hauptmann M, et al.A functional genetic approach identifies the pi3k pathway as a major determinant of trastuzumab resistance in breast cancer. Cancer Cell. 2007; 12(4):395–402.View ArticlePubMedGoogle Scholar
 Levine AJ. p53, the cellular gatekeeper for growth and division. Cell. 1997; 88(3):323–31.View ArticlePubMedGoogle Scholar
 Sherr CJ, McCormick F. The rb and p53 pathways in cancer. Cancer Cell. 2002; 2(2):103–12.View ArticlePubMedGoogle Scholar
 Gasco M, Shami S, Crook T. The p53 pathway in breast cancer. Breast Cancer Res. 2002; 4(2):70.View ArticlePubMedPubMed CentralGoogle Scholar
 Skobe M, Hawighorst T, Jackson DG, Prevo R, Janes L, Velasco P, Riccardi L, Alitalo K, Claffey K, Detmar M. Induction of tumor lymphangiogenesis by vegfc promotes breast cancer metastasis. Nat Med. 2001; 7(2):192–8.View ArticlePubMedGoogle Scholar
 Jain RK, Duda DG, Clark JW, Loeffler JS. Lessons from phase iii clinical trials on antivegf therapy for cancer. Nat Clin Prac Oncol. 2006; 3(1):24–40.View ArticleGoogle Scholar
 Kubo M, Nakamura M, Tasaki A, Yamanaka N, Nakashima H, Nomura M, Kuroki S, Katano M. Hedgehog signaling pathway is a new therapeutic target for patients with breast cancer. Cancer Res. 2004; 64(17):6071–4.View ArticlePubMedGoogle Scholar
 Taipale J, Beachy PA. The hedgehog and wnt signalling pathways in cancer. Nature. 2001; 411(6835):349–54.View ArticlePubMedGoogle Scholar
 Olayioye MA, Neve RM, Lane HA, Hynes NE. The erbb signaling network: receptor heterodimerization in development and cancer. EMBO J. 2000; 19(13):3159–67.View ArticlePubMedPubMed CentralGoogle Scholar
 Harari D, Yarden Y. Molecular mechanisms underlying erbb2/her2 action in breast cancer. Oncogene. 2000; 19(53):6102–14.View ArticlePubMedGoogle Scholar
 Downward J. Targeting ras signalling pathways in cancer therapy. Nat Rev Cancer. 2003; 3(1):11–22.View ArticlePubMedGoogle Scholar
 Clark GJ, Der CJ. Aberrant function of the ras signal transduction pathway in human breast cancer. Breast Cancer Res Treat. 1995; 35(1):133–44.View ArticlePubMedGoogle Scholar
 Ko JH, Ko EA, Gu W, Lim I, Bang H, Zhou T. Expression profiling of ion channel genes predicts clinical outcome in breast cancer. Mol Cancer. 2013; 12(1):1.View ArticleGoogle Scholar
 Kunzelmann K. Ion channels and cancer. J Membr Biol. 2005; 205(3):159–73.View ArticlePubMedGoogle Scholar
 Meinshausen N, Bühlmann P. High dimensional graphs and variable selection with the lasso. Ann Stat. 2006; 34:1436–62.View ArticleGoogle Scholar
 Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B. 2005; 67:301–20.View ArticleGoogle Scholar
 Foley J, Nickerson NK, Nam S, Allen KT, Gilmore JL, Nephew KP, Riese DJ. EGFR signaling in breast cancer: bad to the bone. Semin Cell Dev Biol. 2010; 21(9):951–60.View ArticlePubMedPubMed CentralGoogle Scholar
 Ma L, de Roquancourt A, Bertheau P, Chevret S, Millot G, SastreGarau X, Espié M, Marty M, Janin A, Calvo F. Expression of amphiregulin and epidermal growth factor receptor in human breast cancer: analysis of autocriny and stromalepithelial interactions. J Pathol. 2001; 194(4):413–9.View ArticlePubMedGoogle Scholar
 Suo Z, Risberg B, Karlsson MG, Villman K, Skovlund E, Nesland JM. The expression of egfr family ligands in breast carcinomas. Int J Surg Pathol. 2002; 10(2):91–9.View ArticlePubMedGoogle Scholar
 Fiske JL, Fomin VP, Brown ML, Duncan RL, Sikes RA. Voltagesensitive ion channels and cancer. Cancer Metastasis Rev. 2006; 25(3):493–500.View ArticlePubMedGoogle Scholar
 Roger S, Potier M, Vandier C, Besson P, Le Guennec JY. Voltagegated sodium channels: new targets in cancer therapy?Curr Pharm Des. 2006; 12(28):3681–95.View ArticlePubMedGoogle Scholar
 Camerino DC, Tricarico D, Desaphy JF. Ion channel pharmacology. Neurotherapeutics. 2007; 4(2):184–98.View ArticlePubMedGoogle Scholar
 Camerino DC, Desaphy JF, Tricarico D, Pierno S, Liantonio A. 4therapeutic approaches to ion channel diseases. Adv Genet. 2008; 64:81–145.PubMedGoogle Scholar
 Lee S, et al.Learning a prior on regulatory potential from eQTL data. PLoS Genet. 2009; 5:1000358.View ArticleGoogle Scholar