Suppr超能文献

使用神经序列到序列模型检测放射学报告中的插入、替换和删除错误。

Detecting insertion, substitution, and deletion errors in radiology reports using neural sequence-to-sequence models.

作者信息

Zech John, Forde Jessica, Titano Joseph J, Kaji Deepak, Costa Anthony, Oermann Eric Karl

机构信息

Department of Radiology, Icahn School of Medicine, New York, NY, USA.

Project Jupyter, 190 Doe Library, Berkeley, CA, USA.

出版信息

Ann Transl Med. 2019 Jun;7(11):233. doi: 10.21037/atm.2018.08.11.

Abstract

BACKGROUND

Errors in grammar, spelling, and usage in radiology reports are common. To automatically detect inappropriate insertions, deletions, and substitutions of words in radiology reports, we proposed using a neural sequence-to-sequence (seq2seq) model.

METHODS

Head CT and chest radiograph reports from Mount Sinai Hospital (MSH) (n=61,722 and 818,978, respectively), Mount Sinai Queens (MSQ) (n=30,145 and 194,309, respectively) and MIMIC-III (n=32,259 and 54,685) were converted into sentences. Insertions, substitutions, and deletions of words were randomly introduced. Seq2seq models were trained using corrupted sentences as input to predict original uncorrupted sentences. Three models were trained using head CTs from MSH, chest radiographs from MSH, and head CTs from all three collections. Model performance was assessed across different sites and modalities. A sample of original, uncorrupted sentences were manually reviewed for any error in syntax, usage, or spelling to estimate real-world proofreading performance of the algorithm.

RESULTS

Seq2seq detected 90.3% and 88.2% of corrupted sentences with 97.7% and 98.8% specificity in same-site, same-modality test sets for head CTs and chest radiographs, respectively. Manual review of original, uncorrupted same-site same-modality head CT sentences demonstrated seq2seq positive predictive value (PPV) 0.393 (157/400; 95% CI, 0.346-0.441) and negative predictive value (NPV) 0.986 (789/800; 95% CI, 0.976-0.992) for detecting sentences containing real-world errors, with estimated sensitivity of 0.389 (95% CI, 0.267-0.542) and specificity 0.986 (95% CI, 0.985-0.987) over n=86,211 uncorrupted training examples.

CONCLUSIONS

Seq2seq models can be highly effective at detecting erroneous insertions, deletions, and substitutions of words in radiology reports. To achieve high performance, these models require site- and modality-specific training examples. Incorporating additional targeted training data could further improve performance in detecting real-world errors in reports.

摘要

背景

放射学报告中存在语法、拼写和用法错误很常见。为了自动检测放射学报告中单词的不适当插入、删除和替换,我们提出使用神经序列到序列(seq2seq)模型。

方法

将西奈山医院(MSH)(分别为61,722份和818,978份)、西奈山皇后区医院(MSQ)(分别为30,145份和194,309份)以及MIMIC-III(32,259份和54,685份)的头部CT和胸部X光报告转换为句子。随机引入单词的插入、替换和删除。使用损坏的句子作为输入训练seq2seq模型,以预测原始未损坏的句子。使用来自MSH的头部CT、来自MSH的胸部X光以及来自所有三个数据集的头部CT训练了三个模型。在不同的站点和模式下评估模型性能。人工检查了一组原始的、未损坏的句子,以查找语法、用法或拼写方面的任何错误,以估计该算法在现实世界中的校对性能。

结果

在头部CT和胸部X光的同站点、同模式测试集中,seq2seq分别检测到90.3%和88.2%的损坏句子,特异性分别为97.7%和98.8%。对原始的、未损坏的同站点同模式头部CT句子进行人工检查显示,seq2seq在检测包含现实世界错误的句子时,阳性预测值(PPV)为0.393(157/400;95%CI,0.346 - 0.441),阴性预测值(NPV)为0.986(789/800;95%CI,0.976 - 0.992),在n = 86,211个未损坏的训练示例中,估计灵敏度为0.389(95%CI,0.267 - 0.542),特异性为0.986(95%CI,0.985 - 0.987)。

结论

Seq2seq模型在检测放射学报告中单词的错误插入、删除和替换方面可以非常有效。为了实现高性能,这些模型需要特定于站点和模式的训练示例。纳入额外的针对性训练数据可以进一步提高检测报告中现实世界错误的性能。

相似文献

8
Bigram frequency analysis for detection of radiology report errors.双词频率分析在放射科报告错误检测中的应用。
Clin Imaging. 2022 Sep;89:84-88. doi: 10.1016/j.clinimag.2022.06.010. Epub 2022 Jun 23.

引用本文的文献

4
Using BERT Models to Label Radiology Reports.使用BERT模型标注放射学报告。
Radiol Artif Intell. 2022 Jul 27;4(4):e220124. doi: 10.1148/ryai.220124. eCollection 2022 Jul.

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验