Suppr超能文献

通过对MEDLINE摘要进行非负张量分解分析来探索转录因子的功能格局

Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts.

作者信息

Roy Sujoy, Yun Daqing, Madahian Behrouz, Berry Michael W, Deng Lih-Yuan, Goldowitz Daniel, Homayouni Ramin

机构信息

Bioinformatics Program, University of Memphis, Memphis, TN, United States.

Center for Translational Informatics, University of Memphis, Memphis, TN, United States.

出版信息

Front Bioeng Biotechnol. 2017 Aug 28;5:48. doi: 10.3389/fbioe.2017.00048. eCollection 2017.

Abstract

In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mode term × gene × TF tensor was constructed that contained weighted frequencies of 106,895 terms in 26,781 abstracts shared among 7,695 genes and 994 TFs. The tensor was decomposed into sub-tensors using non-negative tensor factorization (NTF) across 16 different approximation ranks. Dominant entries of each of 2,861 sub-tensors were extracted to form term-gene-TF annotated transcriptional modules (ATMs). More than 94% of the ATMs were found to be enriched in at least one KEGG pathway or GO category, suggesting that the ATMs are functionally relevant. One advantage of this method is that it can discover potentially new gene-TF associations from the literature. Using a set of microarray and ChIP-Seq datasets as gold standard, we show that the precision of our method for predicting gene-TF associations is significantly higher than chance. In addition, we demonstrate that the terms in each ATM can be used to suggest new GO classifications to genes and TFs. Taken together, our results indicate that NTF is useful for simultaneous extraction and functional annotation of transcriptional regulatory networks from unstructured text, as well as for literature based discovery. A web tool called Transcriptional Regulatory Modules Extracted from Literature (TREMEL), available at http://binf1.memphis.edu/tremel, was built to enable browsing and searching of ATMs.

摘要

在本研究中,我们开发并评估了一种使用非负张量分解(NTF)的新型文本挖掘方法,以同时从MEDLINE摘要中提取由基因集、转录因子(TF)和术语组成的转录模块并对其进行功能注释。构建了一个稀疏的三模式术语×基因×TF张量,其中包含7695个基因和994个TF共享的26781篇摘要中106895个术语的加权频率。使用非负张量分解(NTF)在16个不同的近似秩上对该张量进行分解。提取2861个子张量中每个子张量的主导条目,以形成术语-基因-TF注释转录模块(ATM)。发现超过94%的ATM在至少一个KEGG通路或GO类别中富集,这表明ATM在功能上是相关的。该方法的一个优点是它可以从文献中发现潜在的新基因-TF关联。使用一组微阵列和ChIP-Seq数据集作为金标准,我们表明我们的方法预测基因-TF关联的精度显著高于随机水平。此外,我们证明每个ATM中的术语可用于为基因和TF提出新的GO分类。综上所述,我们的结果表明NTF可用于从非结构化文本中同时提取转录调控网络并对其进行功能注释,以及基于文献的发现。构建了一个名为从文献中提取转录调控模块(TREMEL)的网络工具,可在http://binf1.memphis.edu/tremel上获取,以实现对ATM的浏览和搜索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15b1/5581332/3fe7a8c553a4/fbioe-05-00048-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验