面向放射学中经过策展的数据集的自动化生成：以 CT 肺栓塞影像报告为例的自然语言处理在非结构化报告中的应用。

Towards automated generation of curated datasets in radiology: Application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism.

机构信息

University Hospital Basel, Clinic of Radiology & Nuclear Medicine, University of Basel, Petersgraben 4, 4031 Basel, Switzerland.

出版信息

Eur J Radiol. 2020 Apr;125:108862. doi: 10.1016/j.ejrad.2020.108862. Epub 2020 Feb 6.

DOI:10.1016/j.ejrad.2020.108862

PMID:32135443

Abstract

PURPOSE

To design and evaluate a self-trainable natural language processing (NLP)-based procedure to classify unstructured radiology reports. The method enabling the generation of curated datasets is exemplified on CT pulmonary angiogram (CTPA) reports.

METHOD

We extracted the impressions of CTPA reports created at our institution from 2016 to 2018 (n = 4397; language: German). The status (pulmonary embolism: yes/no) was manually labelled for all exams. Data from 2016/2017 (n = 2801) served as a ground truth to train three NLP architectures that only require a subset of reference datasets for training to be operative. The three architectures were as follows: a convolutional neural network (CNN), a support vector machine (SVM) and a random forest (RF) classifier. Impressions of 2018 (n = 1377) were kept aside and used for general performance measurements. Furthermore, we investigated the dependence of classification performance on the amount of training data with multiple simulations.

RESULTS

The classification performance of all three models was excellent (accuracies: 97 %-99 %; F1 scores 0.88-0.97; AUCs: 0.993-0.997). Highest accuracy was reached by the CNN with 99.1 % (95 % CI 98.5-99.6 %). Training with 470 labelled impressions was sufficient to reach an accuracy of > 93 % with all three NLP architectures.

CONCLUSION

Our NLP-based approaches allow for an automated and highly accurate retrospective classification of CTPA reports with manageable effort solely using unstructured impression sections. We demonstrated that this approach is useful for the classification of radiology reports not written in English. Moreover, excellent classification performance is achieved at relatively small training set sizes.

摘要

目的

设计并评估一种基于自然语言处理（NLP）的可自我训练的方法，以对非结构化放射学报告进行分类。该方法能够生成经过精心整理的数据集，以 CT 肺动脉造影（CTPA）报告为例进行说明。

方法

我们从 2016 年至 2018 年提取了我院创建的 CTPA 报告的印象（n=4397；语言：德语）。所有检查的状态（肺栓塞：是/否）均经过人工标记。2016/2017 年的数据（n=2801）用作训练三个 NLP 架构的基础事实，这些架构仅需要一小部分参考数据集即可进行训练。这三个架构分别为：卷积神经网络（CNN）、支持向量机（SVM）和随机森林（RF）分类器。2018 年的印象（n=1377）保留在一旁，用于进行一般性能测量。此外，我们还通过多次模拟研究了分类性能对训练数据量的依赖性。

结果

所有三种模型的分类性能均非常出色（准确率：97%-99%；F1 分数：0.88-0.97；AUC：0.993-0.997）。CNN 的准确率最高，为 99.1%（95%CI 98.5-99.6%）。使用 470 个标记的印象进行训练，三种 NLP 架构的准确率均超过 93%。

结论

我们的基于 NLP 的方法仅使用非结构化的印象部分，即可实现对 CTPA 报告的自动且高度准确的回顾性分类，并且所需工作量适中。我们证明了该方法对于非英语撰写的放射学报告的分类是有用的。此外，在相对较小的训练集尺寸下，也可以实现出色的分类性能。

相似文献

Towards automated generation of curated datasets in radiology: Application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism.面向放射学中经过策展的数据集的自动化生成：以 CT 肺栓塞影像报告为例的自然语言处理在非结构化报告中的应用。

Eur J Radiol. 2020 Apr;125:108862. doi: 10.1016/j.ejrad.2020.108862. Epub 2020 Feb 6.

Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing.使用基于卷积神经网络的自然语言处理从非结构化的胸腹部计算机断层扫描报告中提取影像学发现。

PLoS One. 2020 Jul 30;15(7):e0236827. doi: 10.1371/journal.pone.0236827. eCollection 2020.

Deep Learning to Classify Radiology Free-Text Reports.深度学习在放射科自由文本报告分类中的应用

Radiology. 2018 Mar;286(3):845-852. doi: 10.1148/radiol.2017171115. Epub 2017 Nov 13.

Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield.在两家大型学术放射科实践中膝关节MRI报告的机器学习分类器性能：一种估计诊断率的工具

AJR Am J Roentgenol. 2017 Apr;208(4):750-753. doi: 10.2214/AJR.16.16128. Epub 2017 Jan 31.

Development and Validation of a Natural Language Processing Model to Identify Low-Risk Pulmonary Embolism in Real Time to Facilitate Safe Outpatient Management.开发并验证一种自然语言处理模型，实时识别低危肺栓塞，以促进安全的门诊管理。

Ann Emerg Med. 2024 Aug;84(2):118-127. doi: 10.1016/j.annemergmed.2024.01.036. Epub 2024 Mar 2.

Transformer versus traditional natural language processing: how much data is enough for automated radiology report classification?Transformer 与传统自然语言处理：自动化放射科报告分类需要多少数据？

Br J Radiol. 2023 Sep;96(1149):20220769. doi: 10.1259/bjr.20220769. Epub 2023 May 25.

Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.使用多任务卷积神经网络从自由文本病理报告中自动提取癌症登记报告信息。

J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.

Creation of a simple natural language processing tool to support an imaging utilization quality dashboard.创建一个简单的自然语言处理工具，以支持影像利用质量仪表板。

Int J Med Inform. 2017 May;101:93-99. doi: 10.1016/j.ijmedinf.2017.02.011. Epub 2017 Feb 21.

Leveraging open dataset and transfer learning for accurate recognition of chronic pulmonary embolism from CT angiogram maximum intensity projection images.利用开放数据集和迁移学习准确识别 CT 血管造影最大密度投影图像中的慢性肺栓塞。

Eur Radiol Exp. 2023 Jun 21;7(1):33. doi: 10.1186/s41747-023-00346-9.

Comprehensive Word-Level Classification of Screening Mammography Reports Using a Neural Network Sequence Labeling Approach.基于神经网络序列标注方法的乳腺 X 线摄影筛查报告的全面词级分类。

J Digit Imaging. 2019 Oct;32(5):685-692. doi: 10.1007/s10278-018-0141-4.

引用本文的文献

Development and Validation of VTE-BERT Natural Language Processing Model for Venous Thromboembolism.用于静脉血栓栓塞的VTE-BERT自然语言处理模型的开发与验证

J Thromb Haemost. 2025 Aug 1. doi: 10.1016/j.jtha.2025.07.021.

BERT-based natural language processing analysis of French CT reports: Application to the measurement of the positivity rate for pulmonary embolism.基于BERT的法语CT报告自然语言处理分析：在肺栓塞阳性率测量中的应用

Res Diagn Interv Imaging. 2023 Mar 27;6:100027. doi: 10.1016/j.redii.2023.100027. eCollection 2023 Jun.

Machine learning natural language processing for identifying venous thromboembolism: systematic review and meta-analysis.机器学习自然语言处理在识别静脉血栓栓塞症中的应用：系统评价和荟萃分析。

Blood Adv. 2024 Jun 25;8(12):2991-3000. doi: 10.1182/bloodadvances.2023012200.

Efficient management of pulmonary embolism diagnosis using a two-step interconnected machine learning model based on electronic health records data.基于电子健康记录数据，使用两步互联机器学习模型对肺栓塞诊断进行高效管理。

Health Inf Sci Syst. 2024 Mar 6;12(1):17. doi: 10.1007/s13755-024-00276-9. eCollection 2024 Dec.

Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques.追踪全球卫生共同财资金：使用自然语言处理技术的机器学习方法。

Front Public Health. 2022 Nov 17;10:1031147. doi: 10.3389/fpubh.2022.1031147. eCollection 2022.

The Use of BP Neural Network Algorithm and Natural Language Processing in the Impact of Social Audit on Enterprise Innovation Ability.BP 神经网络算法和自然语言处理在社会审计对企业创新能力影响中的应用。

Comput Intell Neurosci. 2022 May 18;2022:7297769. doi: 10.1155/2022/7297769. eCollection 2022.

Predicting pulmonary embolism among hospitalized patients with machine learning algorithms.使用机器学习算法预测住院患者的肺栓塞

Pulm Circ. 2022 Jan 11;12(1):e12013. doi: 10.1002/pul2.12013. eCollection 2022 Jan.

Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance.深度学习在放射学中的自然语言处理：报告复杂性、疾病流行率、数据集大小和算法类型对模型性能的影响。

J Med Syst. 2021 Sep 4;45(10):91. doi: 10.1007/s10916-021-01761-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

面向放射学中经过策展的数据集的自动化生成：以 CT 肺栓塞影像报告为例的自然语言处理在非结构化报告中的应用。

Towards automated generation of curated datasets in radiology: Application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism.

机构信息

出版信息

PURPOSE

METHOD

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献