Weng Kung-Hsun, Liu Chung-Feng, Chen Chia-Jung
Department of Medical Imaging, Chi Mei Medical Center, Chiali, Tainan, Taiwan.
Department of Medical Research, Chi Mei Medical Center, Tainan, Taiwan.
JMIR Med Inform. 2023 Apr 25;11:e46348. doi: 10.2196/46348.
Negation and speculation unrelated to abnormal findings can lead to false-positive alarms for automatic radiology report highlighting or flagging by laboratory information systems.
This internal validation study evaluated the performance of natural language processing methods (NegEx, NegBio, NegBERT, and transformers).
We annotated all negative and speculative statements unrelated to abnormal findings in reports. In experiment 1, we fine-tuned several transformer models (ALBERT [A Lite Bidirectional Encoder Representations from Transformers], BERT [Bidirectional Encoder Representations from Transformers], DeBERTa [Decoding-Enhanced BERT With Disentangled Attention], DistilBERT [Distilled version of BERT], ELECTRA [Efficiently Learning an Encoder That Classifies Token Replacements Accurately], ERNIE [Enhanced Representation through Knowledge Integration], RoBERTa [Robustly Optimized BERT Pretraining Approach], SpanBERT, and XLNet) and compared their performance using precision, recall, accuracy, and F-scores. In experiment 2, we compared the best model from experiment 1 with 3 established negation and speculation-detection algorithms (NegEx, NegBio, and NegBERT).
Our study collected 6000 radiology reports from 3 branches of the Chi Mei Hospital, covering multiple imaging modalities and body parts. A total of 15.01% (105,755/704,512) of words and 39.45% (4529/11,480) of important diagnostic keywords occurred in negative or speculative statements unrelated to abnormal findings. In experiment 1, all models achieved an accuracy of >0.98 and F-score of >0.90 on the test data set. ALBERT exhibited the best performance (accuracy=0.991; F-score=0.958). In experiment 2, ALBERT outperformed the optimized NegEx, NegBio, and NegBERT methods in terms of overall performance (accuracy=0.996; F-score=0.991), in the prediction of whether diagnostic keywords occur in speculative statements unrelated to abnormal findings, and in the improvement of the performance of keyword extraction (accuracy=0.996; F-score=0.997).
The ALBERT deep learning method showed the best performance. Our results represent a significant advancement in the clinical applications of computer-aided notification systems.
与异常发现无关的否定和推测可能导致实验室信息系统对放射学报告进行自动高亮显示或标记时出现假阳性警报。
这项内部验证研究评估了自然语言处理方法(NegEx、NegBio、NegBERT和Transformer)的性能。
我们对报告中所有与异常发现无关的否定和推测性陈述进行了标注。在实验1中,我们对几个Transformer模型(ALBERT [来自Transformer的轻量级双向编码器表示]、BERT [来自Transformer的双向编码器表示]、DeBERTa [具有解缠注意力的解码增强BERT]、DistilBERT [BERT的蒸馏版本]、ELECTRA [高效学习准确分类令牌替换的编码器]、ERNIE [通过知识整合增强表示]、RoBERTa [稳健优化的BERT预训练方法]、SpanBERT和XLNet)进行了微调,并使用精确率、召回率、准确率和F分数比较了它们的性能。在实验2中,我们将实验1中表现最佳的模型与3种已建立的否定和推测检测算法(NegEx、NegBio和NegBERT)进行了比较。
我们的研究从奇美医院的3个分支机构收集了6000份放射学报告,涵盖多种成像模态和身体部位。在与异常发现无关的否定或推测性陈述中,共有15.01%(105,755/704,512)的单词和39.45%(4529/11,480)的重要诊断关键词出现。在实验1中,所有模型在测试数据集上的准确率均>0.98,F分数均>0.90。ALBERT表现最佳(准确率=0.991;F分数=0.958)。在实验2中,ALBERT在整体性能(准确率=0.996;F分数=0.991)、预测诊断关键词是否出现在与异常发现无关的推测性陈述中以及提高关键词提取性能(准确率=0.996;F分数=0.997)方面均优于优化后的NegEx、NegBio和NegBERT方法。
ALBERT深度学习方法表现最佳。我们的结果代表了计算机辅助通知系统临床应用的重大进展。