Suppr超能文献

基于先验知识驱动的基因调控网络发现中的格兰杰因果关系分析。

Prior knowledge driven Granger causality analysis on gene regulatory network discovery.

作者信息

Yao Shun, Yoo Shinjae, Yu Dantong

机构信息

Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, 11790, NY, USA.

Computational Science Center, Brookhaven National Laboratory, Upton, 11793, NY, USA.

出版信息

BMC Bioinformatics. 2015 Aug 28;16:273. doi: 10.1186/s12859-015-0710-1.

Abstract

BACKGROUND

Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>>T.

RESULTS

In this study, we proposed a new method, viz., CGC-2SPR (CGC using two-step prior Ridge regularization) to resolve the problem by incorporating prior biological knowledge about a target gene data set. In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods. In addition, we applied CGC-2SPR to a real biological dataset, i.e., the yeast metabolic cycle, and discovered more true positive edges with CGC-2SPR than with the other existing methods.

CONCLUSIONS

In our research, we noticed a " 1+1>2" effect when we combined prior knowledge and gene expression data to discover regulatory networks. Based on causality networks, we made a functional prediction that the Abm1 gene (its functions previously were unknown) might be related to the yeast's responses to different levels of glucose. Our research improves causality modeling by combining heterogeneous knowledge, which is well aligned with the future direction in system biology. Furthermore, we proposed a method of Monte Carlo significance estimation (MCSE) to calculate the edge significances which provide statistical meanings to the discovered causality networks. All of our data and source codes will be available under the link https://bitbucket.org/dtyu/granger-causality/wiki/Home.

摘要

背景

我们的研究专注于使用格兰杰因果关系(GC)模型从时间序列基因表达数据中发现基因调控网络。然而,在生物学数据集中,可用时间点(T)的数量通常远小于目标基因(n)的数量。当n >> T时,广泛应用的成对GC模型(PGC)和其他正则化策略可能会导致大量错误识别。

结果

在本研究中,我们提出了一种新方法,即CGC - 2SPR(使用两步先验岭正则化的CGC),通过纳入关于目标基因数据集的先验生物学知识来解决该问题。在我们的模拟实验中,提出的新方法CGC - 2SPR在准确性方面相较于其他广泛使用的GC建模方法(PGC、岭回归和套索回归)以及基于互信息的方法(MRNET和ARACNE)有显著的性能提升。此外,我们将CGC - 2SPR应用于一个真实的生物学数据集,即酵母代谢周期,发现与其他现有方法相比,CGC - 2SPR能发现更多真实的正向边。

结论

在我们的研究中,我们注意到在结合先验知识和基因表达数据以发现调控网络时存在“1 + 1 > 2”的效应。基于因果关系网络,我们进行了功能预测,即Abm1基因(其功能先前未知)可能与酵母对不同葡萄糖水平的反应有关。我们的研究通过结合异质知识改进了因果关系建模,这与系统生物学的未来发展方向高度契合。此外,我们提出了一种蒙特卡罗显著性估计(MCSE)方法来计算边的显著性,为发现的因果关系网络提供统计意义。我们所有的数据和源代码可通过链接https://bitbucket.org/dtyu/granger - causality/wiki/Home获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c197/4551367/6ef9c2929344/12859_2015_710_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验