Suppr超能文献

比较葡萄牙语神经病学文本中命名实体识别的不同方法。

Comparing Different Methods for Named Entity Recognition in Portuguese Neurology Text.

机构信息

Center for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal.

出版信息

J Med Syst. 2020 Feb 28;44(4):77. doi: 10.1007/s10916-020-1542-8.

Abstract

Electronic Medical Records (EMRs) are written in an unstructured way, often using natural language. Information Extraction (IE) may be used for acquiring knowledge from such texts, including the automatic recognition of meaningful entities, through models for Named Entity Recognition (NER). However, while most work on the previous was made for English, this experience aimed at testing different methods in Portuguese text, more precisely, on the domain of Neurology, and take some conclusions. This paper comprised the comparison between Conditional Random Fields (CRF), bidirectional Long Short-term Memory - Conditional Random Fields (BiLSTM-CRF) and a BiLSTM-CRF with residual learning connections, using not only Portuguese texts from medical journals but also texts from the Coimbra Hospital and Universitary Centre (CHUC) Neurology Service. Furthermore, the performances of BiLSTM-CRF models using word embeddings (WEs) trained with clinical text and WEs trained with general language texts were compared. Deep learning models achieved F1-Scores of nearly 83% and 75%, respectively for relaxed and strict evaluation, on texts extracted from the medical journal. For texts collected from the Hospital, the same achieved F1-Scores of nearly 71% and 62%. This work concludes that deep learning models outperform the shallow learning models and that in-domain WEs get better results than general language WEs, even when the latter are trained with much more text than the former. Furthermore, the results show that it is possible to extract information from Hospital clinical texts with models trained with clinical cases extracted from medical journals, and thus openly available. Nevertheless, such results still require a healthcare technician to check if the information is well extracted.

摘要

电子病历(EMR)以非结构化的方式书写,通常使用自然语言。信息提取(IE)可用于从这些文本中获取知识,包括通过命名实体识别(NER)模型自动识别有意义的实体。然而,尽管之前的大部分工作都是针对英语进行的,但这项工作旨在测试不同的方法在葡萄牙语文本中的应用,更确切地说,是在神经病学领域,并得出一些结论。本文比较了条件随机场(CRF)、双向长短时记忆-条件随机场(BiLSTM-CRF)和带有残差学习连接的 BiLSTM-CRF 这三种方法,不仅使用了来自医学期刊的葡萄牙语文本,还使用了科英布拉医院和大学中心(CHUC)神经病学服务的文本。此外,还比较了使用临床文本训练的词向量(WE)和使用通用语言文本训练的 WE 训练的 BiLSTM-CRF 模型的性能。深度学习模型在医学期刊中提取的文本上分别实现了宽松和严格评估下近 83%和 75%的 F1-Score。对于从医院收集的文本,相同的模型在宽松和严格评估下分别实现了近 71%和 62%的 F1-Score。这项工作的结论是,深度学习模型优于浅层学习模型,并且领域内的 WE 比通用语言的 WE 获得更好的结果,即使后者的训练文本比前者多得多。此外,结果表明,使用从医学期刊中提取的临床案例训练的模型从医院的临床文本中提取信息是可行的,并且这些信息是公开的。然而,这些结果仍然需要医疗技术人员来检查信息是否被正确提取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验