Wei Chih-Hsuan, Leaman Robert, Lu Zhiyong
National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), Bethesda, MD 20894, USA.
Bioinformatics. 2016 Jun 15;32(12):1907-10. doi: 10.1093/bioinformatics/btv760. Epub 2016 Feb 16.
The biomedical literature is a knowledge-rich resource and an important foundation for future research. With over 24 million articles in PubMed and an increasing growth rate, research in automated text processing is becoming increasingly important. We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. Unlike most text-mining software tools, our web services integrate several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g. scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have preprocessed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text.
Our text-mining web service is freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl
生物医学文献是一个知识丰富的资源,也是未来研究的重要基础。PubMed中有超过2400万篇文章,且增长率不断上升,因此自动化文本处理研究变得越来越重要。我们在此报告我们最近开发的用于生物医学概念识别和标准化的基于网络的文本挖掘服务。与大多数文本挖掘软件工具不同,我们的网络服务集成了多个最先进的实体标记系统(DNorm、GNormPlus、SR4GN、tmChem和tmVar),并提供批处理模式,能够处理多种格式(如BioC)的任意文本输入(如学术出版物、专利和医疗记录)。我们支持多种标准,以使我们的服务具有互操作性,并允许与其他文本处理管道进行更简单的集成。为了最大限度地提高可扩展性,我们对所有PubMed文章进行了预处理,并使用计算机集群来处理对任意文本的大量请求。
我们的文本挖掘网络服务可在http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl免费获取。