Suppr超能文献

药物命名实体识别标注器:一种基于深度学习的工具,用于在西班牙语医学文本中自动查找化学物质和药物。

PharmacoNER Tagger: a deep learning-based tool for automatically finding chemicals and drugs in Spanish medical texts.

作者信息

Armengol-Estapé Jordi, Soares Felipe, Marimon Montserrat, Krallinger Martin

机构信息

Universitat Politècnica de Catalunya (UPC), 08034 Barcelona, Spain.

Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain.

出版信息

Genomics Inform. 2019 Jun;17(2):e15. doi: 10.5808/GI.2019.17.2.e15. Epub 2019 Jun 19.

Abstract

Automatically detecting mentions of pharmaceutical drugs and chemical substances is key for the subsequent extraction of relations of chemicals with other biomedical entities such as genes, proteins, diseases, adverse reactions or symptoms. The identification of drug mentions is also a prior step for complex event types such as drug dosage recognition, duration of medical treatments or drug repurposing. Formally, this task is known as named entity recognition (NER), meaning automatically identifying mentions of predefined entities of interest in running text. In the domain of medical texts, for chemical entity recognition (CER), techniques based on hand-crafted rules and graph-based models can provide adequate performance. In the recent years, the field of natural language processing has mainly pivoted to deep learning and state-of-the-art results for most tasks involving natural language are usually obtained with artificial neural networks. Competitive resources for drug name recognition in English medical texts are already available and heavily used, while for other languages such as Spanish these tools, although clearly needed were missing. In this work, we adapt an existing neural NER system, NeuroNER, to the particular domain of Spanish clinical case texts, and extend the neural network to be able to take into account additional features apart from the plain text. NeuroNER can be considered a competitive baseline system for Spanish drug and CER promoted by the Spanish national plan for the advancement of language technologies (Plan TL). PharmacoNER Tagger can be accessed at https://github.com/PlanTL-SANIDAD/PharmacoNER.

摘要

自动检测药物和化学物质的提及对于后续提取化学物质与其他生物医学实体(如基因、蛋白质、疾病、不良反应或症状)之间的关系至关重要。药物提及的识别也是诸如药物剂量识别、治疗持续时间或药物重新利用等复杂事件类型的前置步骤。形式上,这项任务被称为命名实体识别(NER),即自动识别运行文本中预定义的感兴趣实体的提及。在医学文本领域,对于化学实体识别(CER),基于手工规则和基于图的模型的技术可以提供足够的性能。近年来,自然语言处理领域主要转向深度学习,并且对于大多数涉及自然语言的任务,通常使用人工神经网络获得最先进的结果。用于英语医学文本中药物名称识别的有竞争力的资源已经可用并被大量使用,而对于其他语言(如西班牙语),这些工具虽然明显需要但却缺失。在这项工作中,我们将现有的神经NER系统NeuroNER应用于西班牙语临床病例文本的特定领域,并扩展神经网络以能够考虑除纯文本之外的其他特征。NeuroNER可以被视为西班牙国家语言技术发展计划(Plan TL)推动的用于西班牙语药物和CER的有竞争力的基线系统。可以在https://github.com/PlanTL-SANIDAD/PharmacoNER访问PharmacoNER Tagger。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7769/6808625/b1b45d2c3094/gi-2019-17-2-e15f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验