Suppr超能文献

使用神经序列到序列模型检测放射学报告中的插入、替换和删除错误。

Detecting insertion, substitution, and deletion errors in radiology reports using neural sequence-to-sequence models.

作者信息

Zech John, Forde Jessica, Titano Joseph J, Kaji Deepak, Costa Anthony, Oermann Eric Karl

机构信息

Department of Radiology, Icahn School of Medicine, New York, NY, USA.

Project Jupyter, 190 Doe Library, Berkeley, CA, USA.

出版信息

Ann Transl Med. 2019 Jun;7(11):233. doi: 10.21037/atm.2018.08.11.

Abstract

BACKGROUND

Errors in grammar, spelling, and usage in radiology reports are common. To automatically detect inappropriate insertions, deletions, and substitutions of words in radiology reports, we proposed using a neural sequence-to-sequence (seq2seq) model.

METHODS

Head CT and chest radiograph reports from Mount Sinai Hospital (MSH) (n=61,722 and 818,978, respectively), Mount Sinai Queens (MSQ) (n=30,145 and 194,309, respectively) and MIMIC-III (n=32,259 and 54,685) were converted into sentences. Insertions, substitutions, and deletions of words were randomly introduced. Seq2seq models were trained using corrupted sentences as input to predict original uncorrupted sentences. Three models were trained using head CTs from MSH, chest radiographs from MSH, and head CTs from all three collections. Model performance was assessed across different sites and modalities. A sample of original, uncorrupted sentences were manually reviewed for any error in syntax, usage, or spelling to estimate real-world proofreading performance of the algorithm.

RESULTS

Seq2seq detected 90.3% and 88.2% of corrupted sentences with 97.7% and 98.8% specificity in same-site, same-modality test sets for head CTs and chest radiographs, respectively. Manual review of original, uncorrupted same-site same-modality head CT sentences demonstrated seq2seq positive predictive value (PPV) 0.393 (157/400; 95% CI, 0.346-0.441) and negative predictive value (NPV) 0.986 (789/800; 95% CI, 0.976-0.992) for detecting sentences containing real-world errors, with estimated sensitivity of 0.389 (95% CI, 0.267-0.542) and specificity 0.986 (95% CI, 0.985-0.987) over n=86,211 uncorrupted training examples.

CONCLUSIONS

Seq2seq models can be highly effective at detecting erroneous insertions, deletions, and substitutions of words in radiology reports. To achieve high performance, these models require site- and modality-specific training examples. Incorporating additional targeted training data could further improve performance in detecting real-world errors in reports.

摘要

背景

放射学报告中存在语法、拼写和用法错误很常见。为了自动检测放射学报告中单词的不适当插入、删除和替换,我们提出使用神经序列到序列(seq2seq)模型。

方法

将西奈山医院(MSH)(分别为61,722份和818,978份)、西奈山皇后区医院(MSQ)(分别为30,145份和194,309份)以及MIMIC-III(32,259份和54,685份)的头部CT和胸部X光报告转换为句子。随机引入单词的插入、替换和删除。使用损坏的句子作为输入训练seq2seq模型,以预测原始未损坏的句子。使用来自MSH的头部CT、来自MSH的胸部X光以及来自所有三个数据集的头部CT训练了三个模型。在不同的站点和模式下评估模型性能。人工检查了一组原始的、未损坏的句子,以查找语法、用法或拼写方面的任何错误,以估计该算法在现实世界中的校对性能。

结果

在头部CT和胸部X光的同站点、同模式测试集中,seq2seq分别检测到90.3%和88.2%的损坏句子,特异性分别为97.7%和98.8%。对原始的、未损坏的同站点同模式头部CT句子进行人工检查显示,seq2seq在检测包含现实世界错误的句子时,阳性预测值(PPV)为0.393(157/400;95%CI,0.346 - 0.441),阴性预测值(NPV)为0.986(789/800;95%CI,0.976 - 0.992),在n = 86,211个未损坏的训练示例中,估计灵敏度为0.389(95%CI,0.267 - 0.542),特异性为0.986(95%CI,0.985 - 0.987)。

结论

Seq2seq模型在检测放射学报告中单词的错误插入、删除和替换方面可以非常有效。为了实现高性能,这些模型需要特定于站点和模式的训练示例。纳入额外的针对性训练数据可以进一步提高检测报告中现实世界错误的性能。

相似文献

2
Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study.
PLoS Med. 2018 Nov 6;15(11):e1002683. doi: 10.1371/journal.pmed.1002683. eCollection 2018 Nov.
3
Comparison of Chest Radiograph Interpretations by Artificial Intelligence Algorithm vs Radiology Residents.
JAMA Netw Open. 2020 Oct 1;3(10):e2022779. doi: 10.1001/jamanetworkopen.2020.22779.
4
Automatic Correction of Real-Word Errors in Spanish Clinical Texts.
Sensors (Basel). 2021 Apr 21;21(9):2893. doi: 10.3390/s21092893.
5
Application of a Domain-specific BERT for Detection of Speech Recognition Errors in Radiology Reports.
Radiol Artif Intell. 2022 May 25;4(4):e210185. doi: 10.1148/ryai.210185. eCollection 2022 Jul.
6
Automatic Disease Annotation From Radiology Reports Using Artificial Intelligence Implemented by a Recurrent Neural Network.
AJR Am J Roentgenol. 2019 Apr;212(4):734-740. doi: 10.2214/AJR.18.19869. Epub 2019 Jan 30.
7
Automated Triaging of Adult Chest Radiographs with Deep Artificial Neural Networks.
Radiology. 2019 Apr;291(1):196-202. doi: 10.1148/radiol.2018180921. Epub 2019 Jan 22.
8
Bigram frequency analysis for detection of radiology report errors.
Clin Imaging. 2022 Sep;89:84-88. doi: 10.1016/j.clinimag.2022.06.010. Epub 2022 Jun 23.

引用本文的文献

1
Generative Large Language Models Trained for Detecting Errors in Radiology Reports.
Radiology. 2025 May;315(2):e242575. doi: 10.1148/radiol.242575.
2
Generative Large Language Models for Detection of Speech Recognition Errors in Radiology Reports.
Radiol Artif Intell. 2024 Mar;6(2):e230205. doi: 10.1148/ryai.230205.
3
Using automated methods to detect safety problems with health information technology: a scoping review.
J Am Med Inform Assoc. 2023 Jan 18;30(2):382-392. doi: 10.1093/jamia/ocac220.
4
Using BERT Models to Label Radiology Reports.
Radiol Artif Intell. 2022 Jul 27;4(4):e220124. doi: 10.1148/ryai.220124. eCollection 2022 Jul.
5
Application of a Domain-specific BERT for Detection of Speech Recognition Errors in Radiology Reports.
Radiol Artif Intell. 2022 May 25;4(4):e210185. doi: 10.1148/ryai.210185. eCollection 2022 Jul.
6
A systematic review of natural language processing applied to radiology reports.
BMC Med Inform Decis Mak. 2021 Jun 3;21(1):179. doi: 10.1186/s12911-021-01533-7.
7
Automated Misspelling Detection and Correction in Persian Clinical Text.
J Digit Imaging. 2020 Jun;33(3):555-562. doi: 10.1007/s10278-019-00296-y.

本文引用的文献

1
Natural Language-based Machine Learning Models for the Annotation of Clinical Radiology Reports.
Radiology. 2018 May;287(2):570-580. doi: 10.1148/radiol.2018171093. Epub 2018 Jan 30.
2
Distraction in diagnostic radiology: How is search through volumetric medical images affected by interruptions?
Cogn Res Princ Implic. 2017;2(1):12. doi: 10.1186/s41235-017-0050-y. Epub 2017 Feb 20.
3
MIMIC-III, a freely accessible critical care database.
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.
4
Syntactic and semantic errors in radiology reports associated with speech recognition software.
Health Informatics J. 2017 Mar;23(1):3-13. doi: 10.1177/1460458215613614. Epub 2016 Jul 26.
5
Speech recognition in the radiology department: a systematic review.
Health Inf Manag. 2015;44(2):4-10. doi: 10.1177/183335831504400201.
8
The "open letter": radiologists' reports in the era of patient web portals.
J Am Coll Radiol. 2014 Sep;11(9):863-7. doi: 10.1016/j.jacr.2014.03.014. Epub 2014 May 16.
9
The radiologist's workflow environment: evaluation of disruptors and potential implications.
J Am Coll Radiol. 2014 Jun;11(6):589-93. doi: 10.1016/j.jacr.2013.12.026. Epub 2014 Apr 26.
10
Implementation of speech recognition in a community-based radiology practice: effect on report turnaround times.
J Am Coll Radiol. 2014 Apr;11(4):402-6. doi: 10.1016/j.jacr.2013.07.008. Epub 2013 Oct 23.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验