使用基于卷积神经网络的自然语言处理从非结构化的胸腹部计算机断层扫描报告中提取影像学发现。

Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing.

机构信息

Department of Radiology, Weill Cornell Medicine, New York, New York, United States of America.

Information Technologies and Services, Weill Cornell Medicine, New York, New York, United States of America.

出版信息

PLoS One. 2020 Jul 30;15(7):e0236827. doi: 10.1371/journal.pone.0236827. eCollection 2020.

DOI:10.1371/journal.pone.0236827

PMID:32730362

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7392233/

Abstract

BACKGROUND

Heart failure (HF) is a major cause of morbidity and mortality. However, much of the clinical data is unstructured in the form of radiology reports, while the process of data collection and curation is arduous and time-consuming.

PURPOSE

We utilized a machine learning (ML)-based natural language processing (NLP) approach to extract clinical terms from unstructured radiology reports. Additionally, we investigate the prognostic value of the extracted data in predicting all-cause mortality (ACM) in HF patients.

MATERIALS AND METHODS

This observational cohort study utilized 122,025 thoracoabdominal computed tomography (CT) reports from 11,808 HF patients obtained between 2008 and 2018. 1,560 CT reports were manually annotated for the presence or absence of 14 radiographic findings, in addition to age and gender. Thereafter, a Convolutional Neural Network (CNN) was trained, validated and tested to determine the presence or absence of these features. Further, the ability of CNN to predict ACM was evaluated using Cox regression analysis on the extracted features.

RESULTS

11,808 CT reports were analyzed from 11,808 patients (mean age 72.8 ± 14.8 years; 52.7% (6,217/11,808) male) from whom 3,107 died during the 10.6-year follow-up. The CNN demonstrated excellent accuracy for retrieval of the 14 radiographic findings with area-under-the-curve (AUC) ranging between 0.83-1.00 (F1 score 0.84-0.97). Cox model showed the time-dependent AUC for predicting ACM was 0.747 (95% confidence interval [CI] of 0.704-0.790) at 30 days.

CONCLUSION

An ML-based NLP approach to unstructured CT reports demonstrates excellent accuracy for the extraction of predetermined radiographic findings, and provides prognostic value in HF patients.

摘要

背景

心力衰竭（HF）是发病率和死亡率的主要原因。然而，大部分临床数据以放射学报告的形式呈现为非结构化数据，而数据收集和整理的过程既艰巨又耗时。

目的

我们利用基于机器学习（ML）的自然语言处理（NLP）方法从非结构化放射学报告中提取临床术语。此外，我们还研究了提取数据在预测心力衰竭患者全因死亡率（ACM）方面的预后价值。

材料和方法

这项观察性队列研究使用了 2008 年至 2018 年间从 11808 例心力衰竭患者中获得的 122025 例胸腹部计算机断层扫描（CT）报告。为了确定 14 种影像学表现的存在与否，除了年龄和性别外，还对 1560 份 CT 报告进行了手动注释。此后，训练、验证和测试了卷积神经网络（CNN），以确定这些特征的存在与否。此外，还使用 Cox 回归分析对提取的特征进行了评估，以确定 CNN 预测 ACM 的能力。

结果

对 11808 名患者（平均年龄 72.8 ± 14.8 岁；52.7%（6217/11808）为男性）的 11808 份 CT 报告进行了分析，这些患者在 10.6 年的随访中，有 3107 人死亡。CNN 在提取 14 种影像学表现方面具有出色的准确性，曲线下面积（AUC）范围为 0.83-1.00（F1 评分 0.84-0.97）。Cox 模型显示，预测 ACM 的时间依赖 AUC 在 30 天时为 0.747（95%置信区间[CI]：0.704-0.790）。

结论

基于机器学习的 NLP 方法可从非结构化 CT 报告中准确提取预定的影像学表现，并为心力衰竭患者提供预后价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48ef/7392233/d80feb913694/pone.0236827.g001.jpg

相似文献

Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing.使用基于卷积神经网络的自然语言处理从非结构化的胸腹部计算机断层扫描报告中提取影像学发现。

PLoS One. 2020 Jul 30;15(7):e0236827. doi: 10.1371/journal.pone.0236827. eCollection 2020.

Towards automated generation of curated datasets in radiology: Application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism.面向放射学中经过策展的数据集的自动化生成：以 CT 肺栓塞影像报告为例的自然语言处理在非结构化报告中的应用。

Eur J Radiol. 2020 Apr;125:108862. doi: 10.1016/j.ejrad.2020.108862. Epub 2020 Feb 6.

Deep Learning to Classify Radiology Free-Text Reports.深度学习在放射科自由文本报告分类中的应用

Radiology. 2018 Mar;286(3):845-852. doi: 10.1148/radiol.2017171115. Epub 2017 Nov 13.

A nursing note-aware deep neural network for predicting mortality risk after hospital discharge.基于护理记录的深度学习神经网络预测出院后死亡率。

Int J Nurs Stud. 2024 Aug;156:104797. doi: 10.1016/j.ijnurstu.2024.104797. Epub 2024 May 9.

Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports.基于机器学习和自然语言处理方法，从放射学报告中识别缺血性脑卒中、发病急缓和病变部位。

PLoS One. 2020 Jun 19;15(6):e0234908. doi: 10.1371/journal.pone.0234908. eCollection 2020.

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

The effect of deep convolutional neural networks on radiologists' performance in the detection of hip fractures on digital pelvic radiographs.深度学习卷积神经网络对放射科医师在数字骨盆 X 线平片检测髋部骨折中表现的影响。

Eur J Radiol. 2020 Sep;130:109188. doi: 10.1016/j.ejrad.2020.109188. Epub 2020 Jul 23.

Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。

J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.

Neural classification of Norwegian radiology reports: using NLP to detect findings in CT-scans of children.挪威放射学报告的神经分类：使用自然语言处理技术检测儿童 CT 扫描结果。

BMC Med Inform Decis Mak. 2021 Mar 4;21(1):84. doi: 10.1186/s12911-021-01451-8.

Natural Language-based Machine Learning Models for the Annotation of Clinical Radiology Reports.基于自然语言的机器学习模型在临床放射学报告标注中的应用。

Radiology. 2018 May;287(2):570-580. doi: 10.1148/radiol.2018171093. Epub 2018 Jan 30.

引用本文的文献

Fine-tuning of language models for automated structuring of medical exam reports to improve patient screening and analysis.对语言模型进行微调，以实现医学检查报告的自动结构化，从而改善患者筛查与分析。

Sci Rep. 2025 Jul 4;15(1):23949. doi: 10.1038/s41598-025-05695-6.

DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization.DIRI：使用大语言模型进行对抗性患者重新识别以评估临床文本匿名化

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:355-364. eCollection 2025.

Enhancing Disease Detection in Radiology Reports Through Fine-tuning Lightweight LLM on Weak Labels.通过在弱标签上微调轻量级语言模型增强放射学报告中的疾病检测

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:614-623. eCollection 2025.

Generative Large Language Models Trained for Detecting Errors in Radiology Reports.为检测放射学报告中的错误而训练的生成式大语言模型。

Radiology. 2025 May;315(2):e242575. doi: 10.1148/radiol.242575.

Fusion Modeling: Combining Clinical and Imaging Data to Advance Cardiac Care.融合建模：结合临床和影像数据以推进心脏护理。

Circ Cardiovasc Imaging. 2023 Dec;16(12):e014533. doi: 10.1161/CIRCIMAGING.122.014533. Epub 2023 Dec 11.

Machine Learning in Cardiovascular Risk Prediction and Precision Preventive Approaches.机器学习在心血管风险预测和精准预防方法中的应用。

Curr Atheroscler Rep. 2023 Dec;25(12):1069-1081. doi: 10.1007/s11883-023-01174-3. Epub 2023 Nov 27.

Artificial Intelligence, Wearables and Remote Monitoring for Heart Failure: Current and Future Applications.用于心力衰竭的人工智能、可穿戴设备及远程监测：当前及未来应用

Diagnostics (Basel). 2022 Nov 26;12(12):2964. doi: 10.3390/diagnostics12122964.

Leveraging Deep Representations of Radiology Reports in Survival Analysis for Predicting Heart Failure Patient Mortality.在生存分析中利用放射学报告的深度表征预测心力衰竭患者死亡率

Proc Conf. 2021 Jun;2021:4533-4538. doi: 10.18653/v1/2021.naacl-main.358.

An architecture for research computing in health to support clinical and translational investigators with electronic patient data.用于健康研究计算的架构，以支持临床和转化研究人员使用电子患者数据。

J Am Med Inform Assoc. 2022 Mar 15;29(4):677-685. doi: 10.1093/jamia/ocab266.

本文引用的文献

Heart Disease and Stroke Statistics-2019 Update: A Report From the American Heart Association.《2019年心脏病和中风统计数据更新：美国心脏协会报告》

Circulation. 2019 Mar 5;139(10):e56-e528. doi: 10.1161/CIR.0000000000000659.

NegBio: a high-performance tool for negation and uncertainty detection in radiology reports.NegBio：一种用于放射学报告中否定和不确定性检测的高性能工具。

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:188-196. eCollection 2018.

A bibliometric analysis of natural language processing in medical research.自然语言处理在医学研究中的文献计量分析。

BMC Med Inform Decis Mak. 2018 Mar 22;18(Suppl 1):14. doi: 10.1186/s12911-018-0594-x.

Clinical information extraction applications: A literature review.临床信息提取应用：文献综述。

J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.

Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.用于捕获和标准化非结构化临床信息的自然语言处理系统：一项系统综述。

J Biomed Inform. 2017 Sep;73:14-29. doi: 10.1016/j.jbi.2017.07.012. Epub 2017 Jul 17.

Recurrent neural networks for classifying relations in clinical notes.用于对临床记录中的关系进行分类的循环神经网络。

J Biomed Inform. 2017 Aug;72:85-95. doi: 10.1016/j.jbi.2017.07.006. Epub 2017 Jul 8.

Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.支持语义分析的临床自然语言处理的最新进展。

Yearb Med Inform. 2015 Aug 13;10(1):183-93. doi: 10.15265/IY-2015-009.

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.关于通过逐层相关性传播对非线性分类器决策进行逐像素解释

PLoS One. 2015 Jul 10;10(7):e0130140. doi: 10.1371/journal.pone.0130140. eCollection 2015.

Evaluating Random Forests for Survival Analysis using Prediction Error Curves.使用预测误差曲线评估随机森林用于生存分析

J Stat Softw. 2012 Sep;50(11):1-23. doi: 10.18637/jss.v050.i11.

Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks.用于具有竞争风险的删失事件时间的接收者操作特征曲线下时间依赖面积的估计与比较。

Stat Med. 2013 Dec 30;32(30):5381-97. doi: 10.1002/sim.5958. Epub 2013 Sep 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用基于卷积神经网络的自然语言处理从非结构化的胸腹部计算机断层扫描报告中提取影像学发现。

Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing.

机构信息

出版信息

BACKGROUND

PURPOSE

MATERIALS AND METHODS

RESULTS

CONCLUSION

背景

目的

材料和方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献