Deeter Anthony, Dalman Mark, Haddad Joseph, Duan Zhong-Hui
Integrated Bioscience, University of Akron, Akron, Ohio, United States of America.
Department of Computer Science, University of Akron, Akron, Ohio, United States of America.
PLoS One. 2017 Oct 19;12(10):e0186004. doi: 10.1371/journal.pone.0186004. eCollection 2017.
The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.
PubMed数据库提供了大量的出版物数据,这些数据可能很有用,但如果没有自动化计算技术,使用起来本质上很复杂。诸如基因组数据共享库(GDC)和基因表达综合数据库(GEO)之类的数据存储库提供实验数据的存储和检索以及经过整理的基因表达谱。包括Reactome和Ingenuity通路分析在内的遗传相互作用数据库,利用从这些出版物和数据存储库中整理的数据提供通路和实验数据分析。我们创建了一种方法,通过挖掘PubMed数据库中的出版物生成大量贝叶斯网络,来生成和分析共识网络,推断潜在的基因相互作用。通过网络分辨率的概念,这些共识网络可以进行定制,以表示可能的遗传相互作用。我们设计了一组实验,以确认我们的方法在样本和拓扑输入大小的变化中都是稳定的。利用KEGG通路数据库中的基因产物相互作用和挖掘PubMed出版物摘要的数据,我们验证了无论网络分辨率或推断的共识网络如何,我们的方法都能够通过具有多个随机拓扑排序的共识贝叶斯网络生成来推断有意义的基因相互作用。我们的方法不仅可以确认当前已被接受的相互作用的存在,而且还具有假设新相互作用的潜力。我们展示了我们的方法确认了已知基因相互作用的存在,如JAK-STAT-PI3K-AKT-mTOR,推断了新的基因相互作用,如RAS-Bcl-2和RAS-AKT,并发现了JAK-STAT信号通路与心肌收缩KEGG通路之间显著的通路-通路相互作用。