超越准确性：创建可互操作且可扩展的文本挖掘网络服务。

Beyond accuracy: creating interoperable and scalable text-mining web services.

作者信息

Wei Chih-Hsuan, Leaman Robert, Lu Zhiyong

机构信息

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), Bethesda, MD 20894, USA.

出版信息

Bioinformatics. 2016 Jun 15;32(12):1907-10. doi: 10.1093/bioinformatics/btv760. Epub 2016 Feb 16.

DOI:10.1093/bioinformatics/btv760

PMID:26883486

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4908316/

Abstract

UNLABELLED

The biomedical literature is a knowledge-rich resource and an important foundation for future research. With over 24 million articles in PubMed and an increasing growth rate, research in automated text processing is becoming increasingly important. We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. Unlike most text-mining software tools, our web services integrate several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g. scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have preprocessed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text.

AVAILABILITY AND IMPLEMENTATION

Our text-mining web service is freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl

CONTACT

: Zhiyong.Lu@nih.gov.

摘要

未标注

生物医学文献是一个知识丰富的资源，也是未来研究的重要基础。PubMed中有超过2400万篇文章，且增长率不断上升，因此自动化文本处理研究变得越来越重要。我们在此报告我们最近开发的用于生物医学概念识别和标准化的基于网络的文本挖掘服务。与大多数文本挖掘软件工具不同，我们的网络服务集成了多个最先进的实体标记系统（DNorm、GNormPlus、SR4GN、tmChem和tmVar），并提供批处理模式，能够处理多种格式（如BioC）的任意文本输入（如学术出版物、专利和医疗记录）。我们支持多种标准，以使我们的服务具有互操作性，并允许与其他文本处理管道进行更简单的集成。为了最大限度地提高可扩展性，我们对所有PubMed文章进行了预处理，并使用计算机集群来处理对任意文本的大量请求。

可用性与实现

我们的文本挖掘网络服务可在http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl免费获取。

联系方式

Zhiyong.Lu@nih.gov。

相似文献

Beyond accuracy: creating interoperable and scalable text-mining web services.

Bioinformatics. 2016 Jun 15;32(12):1907-10. doi: 10.1093/bioinformatics/btv760. Epub 2016 Feb 16.

tmBioC: improving interoperability of text-mining tools with BioC.

Database (Oxford). 2014 Jul 25;2014. doi: 10.1093/database/bau073. Print 2014.

PubTator: a web-based text mining tool for assisting biocuration.

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22. doi: 10.1093/nar/gkt441. Epub 2013 May 22.

DNorm: disease name normalization with pairwise learning to rank.

Bioinformatics. 2013 Nov 15;29(22):2909-17. doi: 10.1093/bioinformatics/btt474. Epub 2013 Aug 21.

PubMed and beyond: a survey of web tools for searching biomedical literature.

Database (Oxford). 2011 Jan 18;2011:baq036. doi: 10.1093/database/baq036. Print 2011.

PubTator central: automated concept annotation for biomedical full text articles.

Nucleic Acids Res. 2019 Jul 2;47(W1):W587-W593. doi: 10.1093/nar/gkz389.

SR4GN: a species recognition software tool for gene normalization.

PLoS One. 2012;7(6):e38460. doi: 10.1371/journal.pone.0038460. Epub 2012 Jun 5.

tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine.

Bioinformatics. 2018 Jan 1;34(1):80-87. doi: 10.1093/bioinformatics/btx541.

Mining chemical patents with an ensemble of open systems.

Database (Oxford). 2016 May 12;2016. doi: 10.1093/database/baw065. Print 2016.

tmChem: a high performance approach for chemical named entity recognition and normalization.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3. doi: 10.1186/1758-2946-7-S1-S3. eCollection 2015.

引用本文的文献

MetaTron: advancing biomedical annotation empowering relation annotation and collaboration.

BMC Bioinformatics. 2024 Mar 14;25(1):112. doi: 10.1186/s12859-024-05730-9.

GPDminer: a tool for extracting named entities and analyzing relations in biological literature.

BMC Bioinformatics. 2024 Mar 6;25(1):101. doi: 10.1186/s12859-024-05710-z.

Building a large gene expression-cancer knowledge base with limited human annotations.

Database (Oxford). 2023 Sep 27;2023. doi: 10.1093/database/baad061.

PubExN: An Automated PubMed Bulk Article Extractor with Affiliation Normalization Package.

SN Comput Sci. 2023;4(4):353. doi: 10.1007/s42979-023-01687-3. Epub 2023 Apr 26.

Integrated Approaches to Identify miRNA Biomarkers Associated with Cognitive Dysfunction in Multiple Sclerosis Using Text Mining, Gene Expression, Pathways, and GWAS.

Diagnostics (Basel). 2022 Aug 8;12(8):1914. doi: 10.3390/diagnostics12081914.

tmVar 3.0: an improved variant concept recognition and normalization tool.

Bioinformatics. 2022 Sep 15;38(18):4449-4451. doi: 10.1093/bioinformatics/btac537.

GeneCup: mining PubMed and GWAS catalog for gene-keyword relationships.

G3 (Bethesda). 2022 May 6;12(5). doi: 10.1093/g3journal/jkac059.

DES-Tcell is a knowledgebase for exploring immunology-related literature.

Sci Rep. 2021 Jul 12;11(1):14344. doi: 10.1038/s41598-021-93809-1.

Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach.

BMC Genomics. 2020 Nov 10;21(1):773. doi: 10.1186/s12864-020-07185-7.

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.

Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.

本文引用的文献

GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains.

Biomed Res Int. 2015;2015:918710. doi: 10.1155/2015/918710. Epub 2015 Aug 25.

tmChem: a high performance approach for chemical named entity recognition and normalization.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3. doi: 10.1186/1758-2946-7-S1-S3. eCollection 2015.

The CHEMDNER corpus of chemicals and drugs and its annotation principles.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.

BioC interoperability track overview.

Database (Oxford). 2014 Jun 30;2014. doi: 10.1093/database/bau053. Print 2014.

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.

Database (Oxford). 2014 Jun 10;2014. doi: 10.1093/database/bau050. Print 2014.

NCBI disease corpus: a resource for disease name recognition and concept normalization.

J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.

DNorm: disease name normalization with pairwise learning to rank.

Bioinformatics. 2013 Nov 15;29(22):2909-17. doi: 10.1093/bioinformatics/btt474. Epub 2013 Aug 21.

BeCAS: biomedical concept recognition services and visualization.

Bioinformatics. 2013 Aug 1;29(15):1915-6. doi: 10.1093/bioinformatics/btt317. Epub 2013 Jun 4.

PubTator: a web-based text mining tool for assisting biocuration.

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22. doi: 10.1093/nar/gkt441. Epub 2013 May 22.

tmVar: a text mining approach for extracting sequence variants in biomedical literature.

Bioinformatics. 2013 Jun 1;29(11):1433-9. doi: 10.1093/bioinformatics/btt156. Epub 2013 Apr 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

超越准确性：创建可互操作且可扩展的文本挖掘网络服务。

Beyond accuracy: creating interoperable and scalable text-mining web services.

作者信息

Wei Chih-Hsuan, Leaman Robert, Lu Zhiyong

机构信息

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), Bethesda, MD 20894, USA.

出版信息

Bioinformatics. 2016 Jun 15;32(12):1907-10. doi: 10.1093/bioinformatics/btv760. Epub 2016 Feb 16.

DOI:10.1093/bioinformatics/btv760

PMID:26883486

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4908316/

Abstract

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

Our text-mining web service is freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl

CONTACT

: Zhiyong.Lu@nih.gov.

摘要

未标注

可用性与实现

我们的文本挖掘网络服务可在http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl免费获取。

联系方式

Zhiyong.Lu@nih.gov。

超越准确性：创建可互操作且可扩展的文本挖掘网络服务。

Beyond accuracy: creating interoperable and scalable text-mining web services.

作者信息

机构信息

出版信息

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

CONTACT

未标注

可用性与实现

联系方式

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

超越准确性：创建可互操作且可扩展的文本挖掘网络服务。

Beyond accuracy: creating interoperable and scalable text-mining web services.

作者信息

机构信息

出版信息

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

CONTACT

未标注

可用性与实现

联系方式