PubMed便携式：支持文本挖掘应用开发的框架。

PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.

作者信息

Döring Kersten, Grüning Björn A, Telukunta Kiran K, Thomas Philippe, Günther Stefan

机构信息

Pharmaceutical Bioinformatics, Institute of Pharmaceutical Sciences, Albert-Ludwigs University, 79104 Freiburg, Germany.

Bioinformatics, Institute of Computer Science, Albert-Ludwigs University, 79110 Freiburg, Germany.

出版信息

PLoS One. 2016 Oct 5;11(10):e0163794. doi: 10.1371/journal.pone.0163794. eCollection 2016.

DOI:10.1371/journal.pone.0163794

PMID:27706202

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5051953/

Abstract

Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user's system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects.

摘要

从生物医学文献中提取信息的范围和重要性正在不断扩大。有许多工具可用于执行命名实体识别，例如蛋白质、化合物和疾病的识别。此外，还有几种方法可用于提取已识别实体之间的关系。生物创意社区通过每年举办的公开挑战赛来支持这些发展，这催生了一种名为BioC的标准化XML文本注释格式。PubMed提供了对最大的开放生物医学文献库的访问，但没有将其数据与自然语言处理工具相连接的统一方法。因此，需要一个合适的数据环境作为基础，以结合不同的软件解决方案并开发定制的文本挖掘应用程序。PubMedPortable在PubMed引文上构建了一个关系数据库和一个全文索引。它既可以应用于完整的PubMed数据集，也可以应用于下载的PubMed XML文件的任意子集。该软件通过导出不同的数据格式（例如BioC）提供了组合独立应用程序的基础设施。所展示的工作流程说明了如何使用PubMedPortable来检索、存储和分析特定疾病的数据集。提供的用例在PubMedPortable维基中有详细记录。这个开源软件库体积小、易于使用，并且可以根据用户的系统要求进行扩展。它可以在https://github.com/KerstenDoering/PubMedPortable网站上免费获取适用于Linux的版本，也可以作为虚拟容器获取适用于其他操作系统的版本。该方法经过了广泛测试，并在多个项目中成功应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc81/5051953/7fb1f492bb66/pone.0163794.g001.jpg

相似文献

PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.PubMed便携式：支持文本挖掘应用开发的框架。

PLoS One. 2016 Oct 5;11(10):e0163794. doi: 10.1371/journal.pone.0163794. eCollection 2016.

FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining.FamPlex：生物医学文本挖掘中人类蛋白质家族和复合物的实体识别和关系解析资源。

BMC Bioinformatics. 2018 Jun 28;19(1):248. doi: 10.1186/s12859-018-2211-5.

tmBioC: improving interoperability of text-mining tools with BioC.tmBioC：提高文本挖掘工具与BioC的互操作性。

Database (Oxford). 2014 Jul 25;2014. doi: 10.1093/database/bau073. Print 2014.

Beyond accuracy: creating interoperable and scalable text-mining web services.超越准确性：创建可互操作且可扩展的文本挖掘网络服务。

Bioinformatics. 2016 Jun 15;32(12):1907-10. doi: 10.1093/bioinformatics/btv760. Epub 2016 Feb 16.

BioC: a minimalist approach to interoperability for biomedical text processing.BioC：一种用于生物医学文本处理的最小互操作方法。

Database (Oxford). 2013 Sep 18;2013:bat064. doi: 10.1093/database/bat064. Print 2013.

PMC text mining subset in BioC: about three million full-text articles and growing.PMC 文本挖掘子集在 BioC 中：约三百万篇全文文章且还在不断增加。

Bioinformatics. 2019 Sep 15;35(18):3533-3535. doi: 10.1093/bioinformatics/btz070.

MPTM: A tool for mining protein post-translational modifications from literature.MPTM：一种从文献中挖掘蛋白质翻译后修饰的工具。

J Bioinform Comput Biol. 2017 Oct;15(5):1740005. doi: 10.1142/S0219720017400054. Epub 2017 Sep 11.

SimText: a text mining framework for interactive analysis and visualization of similarities among biomedical entities.SimText：一个用于生物医学实体之间相似性的交互式分析和可视化的文本挖掘框架。

Bioinformatics. 2021 Nov 18;37(22):4285-4287. doi: 10.1093/bioinformatics/btab365.

Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.

The Markyt visualisation, prediction and benchmark platform for chemical and gene entity recognition at BioCreative/CHEMDNER challenge.用于生物创意/化学命名实体识别挑战赛中化学和基因实体识别的Markyt可视化、预测和基准测试平台。

Database (Oxford). 2016 Aug 19;2016. doi: 10.1093/database/baw120. Print 2016.

引用本文的文献

Automated recognition of functional compound-protein relationships in literature.文献中功能化合物-蛋白质关系的自动识别。

PLoS One. 2020 Mar 3;15(3):e0220925. doi: 10.1371/journal.pone.0220925. eCollection 2020.

A semantic-based workflow for biomedical literature annotation.基于语义的生物医学文献标注工作流。

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax088.

本文引用的文献

Beyond accuracy: creating interoperable and scalable text-mining web services.超越准确性：创建可互操作且可扩展的文本挖掘网络服务。

Bioinformatics. 2016 Jun 15;32(12):1907-10. doi: 10.1093/bioinformatics/btv760. Epub 2016 Feb 16.

Pancreatic cancer: from state-of-the-art treatments to promising novel therapies.胰腺癌：从最先进的治疗方法到有前途的新疗法。

Nat Rev Clin Oncol. 2015 Jun;12(6):319-34. doi: 10.1038/nrclinonc.2015.53. Epub 2015 Mar 31.

tmChem: a high performance approach for chemical named entity recognition and normalization.tmChem：一种用于化学命名实体识别和标准化的高性能方法。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3. doi: 10.1186/1758-2946-7-S1-S3. eCollection 2015.

OntoGene web services for biomedical text mining.OntoGene 生物医学文本挖掘网络服务。

BMC Bioinformatics. 2014;15 Suppl 14(Suppl 14):S6. doi: 10.1186/1471-2105-15-S14-S6. Epub 2014 Nov 27.

Pancreatic adenocarcinoma.胰腺腺癌

N Engl J Med. 2014 Sep 11;371(11):1039-49. doi: 10.1056/NEJMra1404198.

tmBioC: improving interoperability of text-mining tools with BioC.tmBioC：提高文本挖掘工具与BioC的互操作性。

Database (Oxford). 2014 Jul 25;2014. doi: 10.1093/database/bau073. Print 2014.

Processing biological literature with customizable Web services supporting interoperable formats.通过支持可互操作格式的可定制网络服务处理生物学文献。

Database (Oxford). 2014 Jul 8;2014. doi: 10.1093/database/bau064. Print 2014.

BioC interoperability track overview.生物信息学互操作性赛道概述。

Database (Oxford). 2014 Jun 30;2014. doi: 10.1093/database/bau053. Print 2014.

Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus.用于注释BioC文集的自然语言处理管道及其在NCBI疾病语料库中的应用。

Database (Oxford). 2014 Jun 16;2014. doi: 10.1093/database/bau056. Print 2014.

Accessing biomedical literature in the current information landscape.在当前信息环境下获取生物医学文献。

Methods Mol Biol. 2014;1159:11-31. doi: 10.1007/978-1-4939-0709-0_2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PubMed便携式：支持文本挖掘应用开发的框架。

PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献