Liu Li-Zhi, Wu Fang-Xiang, Zhang Wen-Jun
BMC Syst Biol. 2014;8 Suppl 3(Suppl 3):S1. doi: 10.1186/1752-0509-8-S3-S1. Epub 2014 Oct 22.
As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results.
A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves.
The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.
基因调控网络作为细胞中基因调控的抽象映射,对生物学研究和实际应用都很重要。从微阵列基因表达数据逆向工程基因调控网络是系统生物学中一个具有挑战性的研究问题。随着生物技术的发展,可能会在不同情况下为特定基因网络收集多个时间进程基因表达数据集。通过整合这些多个数据集可以改进基因调控网络的推断。还已知基因表达数据可能被大误差或异常值污染,这可能会影响推断结果。
提出了一种新方法,即Huber组套索法,用于从多个时间进程基因表达数据集中推断相同的潜在网络拓扑结构,并考虑对大误差或异常值的鲁棒性。为了解决所提出方法中涉及的优化问题,开发了一种结合辅助函数最小化和块下降思想的高效算法。一种稳定性选择方法适用于我们的方法,以找到由具有分数的边组成的网络拓扑结构。所提出的方法应用于模拟数据集和实际实验数据集。结果表明,在接收器操作特征曲线下的面积和精确召回率曲线下的面积方面,Huber组套索法均优于组套索法。
算法的收敛性分析从理论上表明,算法生成的序列收敛到问题的最优解。模拟和实际数据示例证明了Huber组套索法在整合多个时间进程基因表达数据集以及提高对大误差或异常值的抗性方面的有效性。