Regularized Regression Incorporating Network Information: Simultaneous Estimation of Covariate Coefficients and Connection Signs
Tinbergen Institute Discussion Paper 14-089/I
31 Pages Posted: 16 Jul 2014 Last revised: 21 Sep 2015
Date Written: June 28, 2014
Abstract
We develop an algorithm that incorporates network information into regression settings. It simultaneously estimates the covariate coefficients and the signs of the network connections (i.e. whether the connections are of an activating or of a repressing type). For the coefficient estimation steps an additional penalty is set on top of the lasso penalty, similarly to Li and Li (2008). We develop a fast implementation for the new method based on coordinate descent. Furthermore, we show how the new methods can be applied to time-to-event data. The new method yields good results in simulation studies concerning sensitivity and specificity of non-zero covariate coefficients, estimation of network connection signs, and prediction performance. We also apply the new method to two microarray time-to-event data sets from patients with ovarian cancer and diffuse large B-cell lymphoma. The new method performs very well in both cases. The main application of this new method is of biomedical nature, but it may also be useful in other fields where network data is available.
Keywords: high-dimensional data, gene expression data, pathway information, penalized regression
JEL Classification: C13, C41, C55
Suggested Citation: Suggested Citation