Suppr超能文献

LSTMVoter:使用序列标注工具集合进行化学命名实体识别。

LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.

作者信息

Hemati Wahed, Mehler Alexander

机构信息

Text Technology Lab, Goethe-University Frankfurt, Robert-Mayer-Straße 10, 60325, Frankfurt am Main, Germany.

出版信息

J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.

Abstract

BACKGROUND

Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of chemical named entities. For this purpose, we transform the task of NER into a sequence labeling problem. We present a series of sequence labeling systems that we used, adapted and optimized in our experiments for solving this task. To this end, we experiment with hyperparameter optimization. Finally, we present LSTMVoter, a two-stage application of recurrent neural networks that integrates the optimized sequence labelers from our study into a single ensemble classifier.

RESULTS

We introduce LSTMVoter, a bidirectional long short-term memory (LSTM) tagger that utilizes a conditional random field layer in conjunction with attention-based feature modeling. Our approach explores information about features that is modeled by means of an attention mechanism. LSTMVoter outperforms each extractor integrated by it in a series of experiments. On the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus, LSTMVoter achieves an F1-score of 90.04%; on the BioCreative V.5 chemical entity mention in patents corpus, it achieves an F1-score of 89.01%.

AVAILABILITY AND IMPLEMENTATION

Data and code are available at https://github.com/texttechnologylab/LSTMVoter .

摘要

背景

化学和生物医学命名实体识别(NER)是自然语言处理中一项重要的预处理任务。从科学文章中识别和提取命名实体在许多科学学科中也越来越受到关注。在文献中定位化学命名实体是化学文本挖掘管道中识别化学提及、其属性和关系的关键步骤,正如文献中所讨论的那样。在这项工作中,我们描述了一种针对生物创意V.5挑战赛中化学命名实体识别和分类的方法。为此,我们将NER任务转化为序列标注问题。我们展示了一系列在实验中使用、调整和优化以解决此任务的序列标注系统。为此,我们进行了超参数优化实验。最后,我们提出了LSTMVoter,这是一种递归神经网络的两阶段应用,它将我们研究中优化后的序列标注器集成到一个单一的集成分类器中。

结果

我们引入了LSTMVoter,这是一种双向长短期记忆(LSTM)标记器,它结合基于注意力的特征建模使用条件随机场层。我们的方法探索了通过注意力机制建模的特征信息。在一系列实验中,LSTMVoter的表现优于其集成的每个提取器。在生物创意IV化合物和药物名称识别(CHEMDNER)语料库上,LSTMVoter的F1分数达到90.04%;在生物创意V.5专利语料库中的化学实体提及任务上,它的F1分数达到89.01%。

可用性和实现

数据和代码可在https://github.com/texttechnologylab/LSTMVoter获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f996/6689880/be848b15c8b3/13321_2018_327_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验