从 PET-CT 解读的非结构化报告中自动提取肺癌分期信息：基于深度学习的自然语言处理。

Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning.

机构信息

Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, South Korea.

Department of Information Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.

出版信息

BMC Med Inform Decis Mak. 2022 Sep 1;22(1):229. doi: 10.1186/s12911-022-01975-7.

DOI:10.1186/s12911-022-01975-7

PMID:36050674

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9438247/

Abstract

BACKGROUND

Extracting metastatic information from previous radiologic-text reports is important, however, laborious annotations have limited the usability of these texts. We developed a deep-learning model for extracting primary lung cancer sites and metastatic lymph nodes and distant metastasis information from PET-CT reports for determining lung cancer stages.

METHODS

PET-CT reports, fully written in English, were acquired from two cohorts of patients with lung cancer who were diagnosed at a tertiary hospital between January 2004 and March 2020. One cohort of 20,466 PET-CT reports was used for training and the validation set, and the other cohort of 4190 PET-CT reports was used for an additional-test set. A pre-processing model (Lung Cancer Spell Checker) was applied to correct the typographical errors, and pseudo-labelling was used for training the model. The deep-learning model was constructed using the Convolutional-Recurrent Neural Network. The performance metrics for the prediction model were accuracy, precision, sensitivity, micro-AUROC, and AUPRC.

RESULTS

For the extraction of primary lung cancer location, the model showed a micro-AUROC of 0.913 and 0.946 in the validation set and the additional-test set, respectively. For metastatic lymph nodes, the model showed a sensitivity of 0.827 and a specificity of 0.960. In predicting distant metastasis, the model showed a micro-AUROC of 0.944 and 0.950 in the validation and the additional-test set, respectively.

CONCLUSION

Our deep-learning method could be used for extracting lung cancer stage information from PET-CT reports and may facilitate lung cancer studies by alleviating laborious annotation by clinicians.

摘要

背景

从先前的放射学文本报告中提取转移信息很重要，然而，繁琐的注释限制了这些文本的可用性。我们开发了一种深度学习模型，用于从 PET-CT 报告中提取原发性肺癌部位和转移性淋巴结以及远处转移信息，以确定肺癌分期。

方法

从 2004 年 1 月至 2020 年 3 月在一家三级医院诊断为肺癌的两批患者中获取了完全用英语书写的 PET-CT 报告。一个队列的 20,466 份 PET-CT 报告用于训练和验证集，另一个队列的 4190 份 PET-CT 报告用于附加测试集。应用预处理模型（肺癌拼写检查器）纠正打字错误，并进行伪标记以训练模型。使用卷积递归神经网络构建深度学习模型。该预测模型的性能指标包括准确性、精确性、敏感性、微 AUROC 和 AUPRC。

结果

对于原发性肺癌位置的提取，该模型在验证集和附加测试集中的微 AUROC 分别为 0.913 和 0.946。对于转移性淋巴结，该模型的敏感性为 0.827，特异性为 0.960。在预测远处转移方面，该模型在验证集和附加测试集中的微 AUROC 分别为 0.944 和 0.950。

结论

我们的深度学习方法可用于从 PET-CT 报告中提取肺癌分期信息，并通过减轻临床医生繁琐的注释工作，为肺癌研究提供便利。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d853/9438247/b9250b3e6e80/12911_2022_1975_Fig1_HTML.jpg

相似文献

Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning.从 PET-CT 解读的非结构化报告中自动提取肺癌分期信息：基于深度学习的自然语言处理。

BMC Med Inform Decis Mak. 2022 Sep 1;22(1):229. doi: 10.1186/s12911-022-01975-7.

An [18F]FDG-PET/CT deep learning method for fully automated detection of pathological mediastinal lymph nodes in lung cancer patients.一种用于肺癌患者病理性纵隔淋巴结全自动检测的[18F]FDG-PET/CT 深度学习方法。

Eur J Nucl Med Mol Imaging. 2022 Feb;49(3):881-888. doi: 10.1007/s00259-021-05513-x. Epub 2021 Sep 14.

F-FDG PET/CT Uptake Classification in Lymphoma and Lung Cancer by Using Deep Convolutional Neural Networks.使用深度卷积神经网络对淋巴瘤和肺癌的 F-FDG PET/CT 摄取进行分类。

Radiology. 2020 Feb;294(2):445-452. doi: 10.1148/radiol.2019191114. Epub 2019 Dec 10.

Prediction of local relapse and distant metastasis in patients with definitive chemoradiotherapy-treated cervical cancer by deep learning from [F]-fluorodeoxyglucose positron emission tomography/computed tomography.深度学习在[F]-氟代脱氧葡萄糖正电子发射断层扫描/计算机断层扫描在根治性放化疗治疗宫颈癌患者中的局部复发和远处转移预测。

Eur Radiol. 2019 Dec;29(12):6741-6749. doi: 10.1007/s00330-019-06265-x. Epub 2019 May 27.

Convolutional Neural Networks in Predicting Nodal and Distant Metastatic Potential of Newly Diagnosed Non-Small Cell Lung Cancer on FDG PET Images.卷积神经网络在预测 FDG PET 图像中新诊断的非小细胞肺癌的淋巴结和远处转移潜能中的应用。

AJR Am J Roentgenol. 2020 Jul;215(1):192-197. doi: 10.2214/AJR.19.22346. Epub 2020 Apr 29.

Extracting lung cancer staging descriptors from pathology reports: A generative language model approach.从病理报告中提取肺癌分期描述符：一种生成式语言模型方法。

J Biomed Inform. 2024 Sep;157:104720. doi: 10.1016/j.jbi.2024.104720. Epub 2024 Sep 2.

Test performance of positron emission tomography and computed tomography for mediastinal staging in patients with non-small-cell lung cancer: a meta-analysis.正电子发射断层扫描和计算机断层扫描在非小细胞肺癌患者纵隔分期中的检测性能：一项荟萃分析。

Ann Intern Med. 2003 Dec 2;139(11):879-92. doi: 10.7326/0003-4819-139-11-200311180-00013.

Deep-Transfer-Learning-Based Natural Language Processing of Serial Free-Text Computed Tomography Reports for Predicting Survival of Patients With Pancreatic Cancer.基于深度迁移学习的胰腺癌细胞患者生存预测的连续自由文本 CT 报告自然语言处理。

JCO Clin Cancer Inform. 2024 Aug;8:e2400021. doi: 10.1200/CCI.24.00021.

Whole-body uptake classification and prostate cancer staging in Ga-PSMA-11 PET/CT using dual-tracer learning.基于双示踪剂学习的 Ga-PSMA-11 PET/CT 全身摄取分类和前列腺癌分期。

Eur J Nucl Med Mol Imaging. 2022 Jan;49(2):517-526. doi: 10.1007/s00259-021-05473-2. Epub 2021 Jul 7.

Clinical Concept-Based Radiology Reports Classification Pipeline for Lung Carcinoma.基于临床概念的肺癌放射学报告分类流水线。

J Digit Imaging. 2023 Jun;36(3):812-826. doi: 10.1007/s10278-023-00787-z. Epub 2023 Feb 14.

引用本文的文献

Uncertainty-aware automatic TNM staging classification for [F] Fluorodeoxyglucose PET-CT reports for lung cancer utilising transformer-based language models and multi-task learning.利用基于Transformer的语言模型和多任务学习对[F]氟脱氧葡萄糖PET-CT肺癌报告进行不确定性感知自动TNM分期分类。

BMC Med Inform Decis Mak. 2024 Dec 18;24(1):396. doi: 10.1186/s12911-024-02814-7.

Preliminary assessment of TNM classification performance for pancreatic cancer in Japanese radiology reports using GPT-4.使用GPT-4对日本放射学报告中胰腺癌的TNM分类性能进行初步评估。

Jpn J Radiol. 2025 Jan;43(1):51-55. doi: 10.1007/s11604-024-01643-y. Epub 2024 Aug 20.

Machine Learning in Diagnosis and Prognosis of Lung Cancer by PET-CT.PET-CT在肺癌诊断与预后中的机器学习应用

Cancer Manag Res. 2024 Apr 24;16:361-375. doi: 10.2147/CMAR.S451871. eCollection 2024.

Automatic Detection of Distant Metastasis Mentions in Radiology Reports in Spanish.自动检测西班牙语放射学报告中的远处转移提及。

JCO Clin Cancer Inform. 2024 Jan;8:e2300130. doi: 10.1200/CCI.23.00130.

本文引用的文献

Deep learning to automate the labelling of head MRI datasets for computer vision applications.深度学习实现头部MRI数据集标注自动化以用于计算机视觉应用。

Eur Radiol. 2022 Jan;32(1):725-736. doi: 10.1007/s00330-021-08132-0. Epub 2021 Jul 20.

Automated ICD-10 code assignment of nonstandard diagnoses via a two-stage framework.通过两阶段框架对非标准诊断进行自动ICD-10编码分配

Artif Intell Med. 2020 Aug;108:101939. doi: 10.1016/j.artmed.2020.101939. Epub 2020 Aug 15.

A Transparent and Adaptable Method to Extract Colonoscopy and Pathology Data Using Natural Language Processing.一种使用自然语言处理提取结肠镜检查和病理学数据的透明且可适应的方法。

J Med Syst. 2020 Jul 31;44(9):151. doi: 10.1007/s10916-020-01604-8.

A two-way comparison of whole-body 18FDG PET-CT and whole-body contrast-enhanced MRI for distant metastasis staging in patients with malignant tumors: a meta-analysis of 13 prospective studies.全身 18FDG PET-CT 与全身对比增强 MRI 对恶性肿瘤患者远处转移分期的双向比较：13 项前瞻性研究的荟萃分析。

Ann Palliat Med. 2020 Mar;9(2):247-255. doi: 10.21037/apm.2020.02.30. Epub 2020 Mar 18.

Deep Learning for Natural Language Processing in Radiology-Fundamentals and a Systematic Review.放射学中自然语言处理的深度学习——基础与系统综述

J Am Coll Radiol. 2020 May;17(5):639-648. doi: 10.1016/j.jacr.2019.12.026. Epub 2020 Jan 28.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

Lung Cancer Incidence and Mortality with Extended Follow-up in the National Lung Screening Trial.国家肺癌筛查试验中延长随访后的肺癌发病率和死亡率。

J Thorac Oncol. 2019 Oct;14(10):1732-1742. doi: 10.1016/j.jtho.2019.05.044. Epub 2019 Jun 28.

A clinical text classification paradigm using weak supervision and deep representation.一种使用弱监督和深度表示的临床文本分类范式。

BMC Med Inform Decis Mak. 2019 Jan 7;19(1):1. doi: 10.1186/s12911-018-0723-6.

The 8 lung cancer TNM classification and clinical staging system: review of the changes and clinical implications.第八版肺癌TNM分类及临床分期系统：变化及临床意义综述

Quant Imaging Med Surg. 2018 Aug;8(7):709-718. doi: 10.21037/qims.2018.08.02.

Automated ICD-9 Coding via A Deep Learning Approach.基于深度学习的自动化 ICD-9 编码。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1193-1202. doi: 10.1109/TCBB.2018.2817488. Epub 2018 Mar 20.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从 PET-CT 解读的非结构化报告中自动提取肺癌分期信息：基于深度学习的自然语言处理。

Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献