Ye Guibo, Tang Mengfan, Cai Jian-Feng, Nie Qing, Xie Xiaohui
Department of Computer Science, University of California Irvine, Irvine, California, United States of America ; Department of Mathematics, University of California Irvine, Irvine, California, United States of America.
Department of Computer Science, University of California Irvine, Irvine, California, United States of America.
PLoS One. 2013 Dec 17;8(12):e82146. doi: 10.1371/journal.pone.0082146. eCollection 2013.
Learning gene expression programs directly from a set of observations is challenging due to the complexity of gene regulation, high noise of experimental measurements, and insufficient number of experimental measurements. Imposing additional constraints with strong and biologically motivated regularizations is critical in developing reliable and effective algorithms for inferring gene expression programs. Here we propose a new form of regulation that constrains the number of independent connectivity patterns between regulators and targets, motivated by the modular design of gene regulatory programs and the belief that the total number of independent regulatory modules should be small. We formulate a multi-target linear regression framework to incorporate this type of regulation, in which the number of independent connectivity patterns is expressed as the rank of the connectivity matrix between regulators and targets. We then generalize the linear framework to nonlinear cases, and prove that the generalized low-rank regularization model is still convex. Efficient algorithms are derived to solve both the linear and nonlinear low-rank regularized problems. Finally, we test the algorithms on three gene expression datasets, and show that the low-rank regularization improves the accuracy of gene expression prediction in these three datasets.
由于基因调控的复杂性、实验测量的高噪声以及实验测量数量的不足,直接从一组观察结果中学习基因表达程序具有挑战性。通过强大且具有生物学动机的正则化施加额外约束,对于开发用于推断基因表达程序的可靠且有效的算法至关重要。在此,我们提出一种新的调控形式,它通过基因调控程序的模块化设计以及独立调控模块总数应较少这一信念,来限制调节因子与靶标之间独立连接模式的数量。我们构建了一个多目标线性回归框架来纳入此类调控,其中独立连接模式的数量表示为调节因子与靶标之间连接矩阵的秩。然后,我们将线性框架推广到非线性情况,并证明广义低秩正则化模型仍然是凸的。我们推导了有效算法来解决线性和非线性低秩正则化问题。最后,我们在三个基因表达数据集上测试了这些算法,并表明低秩正则化提高了这三个数据集中基因表达预测的准确性。