Zuo Yiming, Yu Guoqiang, Tadesse Mahlet G, Ressom Habtom W
Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC, USA; Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA.
Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA.
Methods. 2014 Oct 1;69(3):266-73. doi: 10.1016/j.ymeth.2014.06.010. Epub 2014 Jul 5.
Biological network inference is a major challenge in systems biology. Traditional correlation-based network analysis results in too many spurious edges since correlation cannot distinguish between direct and indirect associations. To address this issue, Gaussian graphical models (GGM) were proposed and have been widely used. Though they can significantly reduce the number of spurious edges, GGM are insufficient to uncover a network structure faithfully due to the fact that they only consider the full order partial correlation. Moreover, when the number of samples is smaller than the number of variables, further technique based on sparse regularization needs to be incorporated into GGM to solve the singular covariance inversion problem. In this paper, we propose an efficient and mathematically solid algorithm that infers biological networks by computing low order partial correlation (LOPC) up to the second order. The bias introduced by the low order constraint is minimal compared to the more reliable approximation of the network structure achieved. In addition, the algorithm is suitable for a dataset with small sample size but large number of variables. Simulation results show that LOPC yields far less spurious edges and works well under various conditions commonly seen in practice. The application to a real metabolomics dataset further validates the performance of LOPC and suggests its potential power in detecting novel biomarkers for complex disease.
生物网络推断是系统生物学中的一项重大挑战。传统的基于相关性的网络分析会产生过多的虚假边,因为相关性无法区分直接关联和间接关联。为了解决这个问题,高斯图形模型(GGM)被提出并得到了广泛应用。尽管它们可以显著减少虚假边的数量,但由于仅考虑全阶偏相关性,GGM不足以如实地揭示网络结构。此外,当样本数量小于变量数量时,需要将基于稀疏正则化的进一步技术纳入GGM来解决奇异协方差逆问题。在本文中,我们提出了一种高效且数学上可靠的算法,该算法通过计算高达二阶的低阶偏相关性(LOPC)来推断生物网络。与所实现的更可靠的网络结构近似相比,低阶约束引入的偏差最小。此外,该算法适用于样本量小但变量数量多的数据集。模拟结果表明,LOPC产生的虚假边要少得多,并且在实际中常见的各种条件下都能很好地工作。将其应用于真实的代谢组学数据集进一步验证了LOPC的性能,并表明其在检测复杂疾病新生物标志物方面的潜在能力。