IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):944-953. doi: 10.1109/TCBB.2016.2640303. Epub 2016 Dec 15.
Molecular profiling data (e.g., gene expression) has been used for clinical risk prediction and biomarker discovery. However, it is necessary to integrate other prior knowledge like biological pathways or gene interaction networks to improve the predictive ability and biological interpretability of biomarkers. Here, we first introduce a general regularized Logistic Regression (LR) framework with regularized term , which can reduce to different penalties, including Lasso, elastic net, and network-regularized terms with different . This framework can be easily solved in a unified manner by a cyclic coordinate descent algorithm which can avoid inverse matrix operation and accelerate the computing speed. However, if those estimated and have opposite signs, then the traditional network-regularized penalty may not perform well. To address it, we introduce a novel network-regularized sparse LR model with a new penalty to consider the difference between the absolute values of the coefficients. We develop two efficient algorithms to solve it. Finally, we test our methods and compare them with the related ones using simulated and real data to show their efficiency.
分子谱数据(例如基因表达)已被用于临床风险预测和生物标志物发现。然而,为了提高生物标志物的预测能力和生物学可解释性,有必要整合其他先验知识,如生物途径或基因互作网络。在这里,我们首先介绍了一个带有正则项的广义正则逻辑回归(LR)框架,它可以简化为不同的惩罚项,包括 Lasso、弹性网络和具有不同的网络正则项。该框架可以通过循环坐标下降算法统一求解,避免了逆矩阵运算并加速了计算速度。然而,如果那些估计的和有相反的符号,那么传统的网络正则化惩罚可能表现不佳。为了解决这个问题,我们引入了一种新的基于网络正则化稀疏 LR 模型,具有新的惩罚项,以考虑系数绝对值之间的差异。我们开发了两种有效的算法来解决它。最后,我们使用模拟和真实数据对我们的方法进行了测试,并与相关方法进行了比较,以显示它们的效率。