Suppr超能文献

文本挖掘在数据集成和网络生物学中的潜力及其在植物研究中的应用:以拟南芥为例。

The potential of text mining in data integration and network biology for plant research: a case study on Arabidopsis.

机构信息

Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium.

出版信息

Plant Cell. 2013 Mar;25(3):794-807. doi: 10.1105/tpc.112.108753. Epub 2013 Mar 26.

Abstract

Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.

摘要

尽管有各种植物研究数据存储库可供使用,但目前仍有大量信息隐藏在生物分子文献中。文本挖掘通过自动处理文本提供了检索这些数据的必要手段。然而,直到最近,先进的文本挖掘方法才结合足够的计算能力,以大规模处理文本。在这项研究中,我们使用最先进的文本挖掘系统评估了大规模文本挖掘在植物生物学研究中的潜力,特别是在网络生物学方面,该系统应用于所有 PubMed 摘要和 PubMed Central 全文。我们对拟南芥的文本数据进行了广泛的评估,评估了该新资源在植物网络分析中的使用的整体准确性。此外,我们将文本挖掘信息与来自实验数据库的蛋白质-蛋白质和调控相互作用相结合。从生成的网络中描绘出紧密连接的基因簇,说明了这种集成方法对于理解当前拟南芥可用的知识以及通过关联发现基因信息是至关重要的。所有大规模数据集以及经过人工编辑的文本数据都将公开提供,从而鼓励在未来的植物生物学研究中应用文本挖掘数据。

相似文献

7
Survey of Natural Language Processing Techniques in Bioinformatics.生物信息学中的自然语言处理技术综述
Comput Math Methods Med. 2015;2015:674296. doi: 10.1155/2015/674296. Epub 2015 Oct 7.
9
Seed bioinformatics.种子生物信息学
Methods Mol Biol. 2011;773:403-19. doi: 10.1007/978-1-61779-231-1_23.

引用本文的文献

7
Cross-species Conservation of context-specific networks.特定上下文网络的跨物种保守性。
BMC Syst Biol. 2016 Aug 17;10(1):76. doi: 10.1186/s12918-016-0304-1.
8
CARFMAP: A Curated Pathway Map of Cardiac Fibroblasts.CARFMAP:心脏成纤维细胞的精选通路图。
PLoS One. 2015 Dec 16;10(12):e0143274. doi: 10.1371/journal.pone.0143274. eCollection 2015.

本文引用的文献

6
University of Turku in the BioNLP'11 Shared Task.图尔库大学在 BioNLP'11 共享任务中的贡献。
BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-13-S11-S4.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验