Suppr超能文献

通过整合先验生物学知识进行基因网络重建

Gene Network Reconstruction by Integration of Prior Biological Knowledge.

作者信息

Li Yupeng, Jackson Scott A

机构信息

Center for Applied Genetic Technologies, University of Georgia, Athens, Georgia Institute of Plant Breeding, Genetics and Genomics, University of Georgia, Athens, Georgia Department of Statistics, University of Georgia, Athens, Georgia 30602.

Center for Applied Genetic Technologies, University of Georgia, Athens, Georgia Institute of Plant Breeding, Genetics and Genomics, University of Georgia, Athens, Georgia

出版信息

G3 (Bethesda). 2015 Mar 30;5(6):1075-9. doi: 10.1534/g3.115.018127.

Abstract

With the development of high-throughput genomic technologies, large, genome-wide datasets have been collected, and the integration of these datasets should provide large-scale, multidimensional, and insightful views of biological systems. We developed a method for gene association network construction based on gene expression data that integrate a variety of biological resources. Assuming gene expression data are from a multivariate Gaussian distribution, a graphical lasso (glasso) algorithm is able to estimate the sparse inverse covariance matrix by a lasso (L1) penalty. The inverse covariance matrix can be seen as direct correlation between gene pairs in the gene association network. In our work, instead of using a single penalty, different penalty values were applied for gene pairs based on a priori knowledge as to whether the two genes should be connected. The a priori information can be calculated or retrieved from other biological data, e.g., Gene Ontology similarity, protein-protein interaction, gene regulatory network. By incorporating prior knowledge, the weighted graphical lasso (wglasso) outperforms the original glasso both on simulations and on data from Arabidopsis. Simulation studies show that even when some prior knowledge is not correct, the overall quality of the wglasso network was still greater than when not incorporating that information, e.g., glasso.

摘要

随着高通量基因组技术的发展,已收集了大量全基因组数据集,这些数据集的整合应能提供生物系统的大规模、多维度且有深刻见解的视图。我们基于整合了多种生物资源的基因表达数据开发了一种基因关联网络构建方法。假设基因表达数据来自多元高斯分布,图形拉索(glasso)算法能够通过拉索(L1)惩罚来估计稀疏逆协方差矩阵。逆协方差矩阵可被视为基因关联网络中基因对之间的直接相关性。在我们的工作中,不是使用单一惩罚,而是基于两个基因是否应相连的先验知识对基因对应用不同的惩罚值。先验信息可从其他生物数据中计算或检索得到,例如基因本体相似性、蛋白质 - 蛋白质相互作用、基因调控网络。通过纳入先验知识,加权图形拉索(wglasso)在模拟和拟南芥数据上均优于原始的glasso。模拟研究表明,即使某些先验知识不正确,wglasso网络的整体质量仍高于不纳入该信息时的情况,例如glasso。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验