Solarte Pabón Oswaldo, Montenegro Orlando, Torrente Maria, Rodríguez González Alejandro, Provencio Mariano, Menasalvas Ernestina
Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain.
Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Cali, Colombia.
PeerJ Comput Sci. 2022 Mar 7;8:e913. doi: 10.7717/peerj-cs.913. eCollection 2022.
Detecting negation and uncertainty is crucial for medical text mining applications; otherwise, extracted information can be incorrectly identified as real or factual events. Although several approaches have been proposed to detect negation and uncertainty in clinical texts, most efforts have focused on the English language. Most proposals developed for Spanish have focused mainly on negation detection and do not deal with uncertainty. In this paper, we propose a deep learning-based approach for both negation and uncertainty detection in clinical texts written in Spanish. The proposed approach explores two deep learning methods to achieve this goal: (i) Bidirectional Long-Short Term Memory with a Conditional Random Field layer (BiLSTM-CRF) and (ii) Bidirectional Encoder Representation for Transformers (BERT). The approach was evaluated using NUBES and IULA, two public corpora for the Spanish language. The results obtained showed an F-score of 92% and 80% in the scope recognition task for negation and uncertainty, respectively. We also present the results of a validation process conducted using a real-life annotated dataset from clinical notes belonging to cancer patients. The proposed approach shows the feasibility of deep learning-based methods to detect negation and uncertainty in Spanish clinical texts. Experiments also highlighted that this approach improves performance in the scope recognition task compared to other proposals in the biomedical domain.
检测否定和不确定性对于医学文本挖掘应用至关重要;否则,提取的信息可能会被错误地识别为真实或事实事件。尽管已经提出了几种方法来检测临床文本中的否定和不确定性,但大多数工作都集中在英语上。为西班牙语开发的大多数提议主要集中在否定检测上,而没有处理不确定性。在本文中,我们提出了一种基于深度学习的方法,用于检测西班牙语临床文本中的否定和不确定性。所提出的方法探索了两种深度学习方法来实现这一目标:(i)带有条件随机场层的双向长短期记忆(BiLSTM-CRF)和(ii)用于Transformer的双向编码器表示(BERT)。该方法使用NUBES和IULA这两个西班牙语公共语料库进行了评估。获得的结果在否定和不确定性的范围识别任务中分别显示出92%和80%的F分数。我们还展示了使用来自癌症患者临床笔记的真实标注数据集进行验证过程的结果。所提出的方法显示了基于深度学习的方法在检测西班牙语临床文本中的否定和不确定性方面的可行性。实验还强调,与生物医学领域的其他提议相比,该方法在范围识别任务中提高了性能。