基于磁共振影像学报告的自然语言处理预测弥漫性脑胶质瘤异柠檬酸脱氢酶基因型

Natural language processing to predict isocitrate dehydrogenase genotype in diffuse glioma using MR radiology reports.

机构信息

Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea.

Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea.

出版信息

Eur Radiol. 2023 Nov;33(11):8017-8025. doi: 10.1007/s00330-023-10061-z. Epub 2023 Aug 11.

DOI:10.1007/s00330-023-10061-z

PMID:37566271

Abstract

OBJECTIVES

To evaluate the performance of natural language processing (NLP) models to predict isocitrate dehydrogenase (IDH) mutation status in diffuse glioma using routine MR radiology reports.

MATERIALS AND METHODS

This retrospective, multi-center study included consecutive patients with diffuse glioma with known IDH mutation status from May 2009 to November 2021 whose initial MR radiology report was available prior to pathologic diagnosis. Five NLP models (long short-term memory [LSTM], bidirectional LSTM, bidirectional encoder representations from transformers [BERT], BERT graph convolutional network [GCN], BioBERT) were trained, and area under the receiver operating characteristic curve (AUC) was assessed to validate prediction of IDH mutation status in the internal and external validation sets. The performance of the best performing NLP model was compared with that of the human readers.

RESULTS

A total of 1427 patients (mean age ± standard deviation, 54 ± 15; 779 men, 54.6%) with 720 patients in the training set, 180 patients in the internal validation set, and 527 patients in the external validation set were included. In the external validation set, BERT GCN showed the highest performance (AUC 0.85, 95% CI 0.81-0.89) in predicting IDH mutation status, which was higher than LSTM (AUC 0.77, 95% CI 0.72-0.81; p = .003) and BioBERT (AUC 0.81, 95% CI 0.76-0.85; p = .03). This was higher than that of a neuroradiologist (AUC 0.80, 95% CI 0.76-0.84; p = .005) and a neurosurgeon (AUC 0.79, 95% CI 0.76-0.84; p = .04).

CONCLUSION

BERT GCN was externally validated to predict IDH mutation status in patients with diffuse glioma using routine MR radiology reports with superior or at least comparable performance to human reader.

CLINICAL RELEVANCE STATEMENT

Natural language processing may be used to extract relevant information from routine radiology reports to predict cancer genotype and provide prognostic information that may aid in guiding treatment strategy and enabling personalized medicine.

KEY POINTS

• A transformer-based natural language processing (NLP) model predicted isocitrate dehydrogenase mutation status in diffuse glioma with an AUC of 0.85 in the external validation set. • The best NLP models were superior or at least comparable to human readers in both internal and external validation sets. • Transformer-based models showed higher performance than conventional NLP model such as long short-term memory.

摘要

目的

评估自然语言处理（NLP）模型在使用常规磁共振成像（MR）报告预测弥漫性神经胶质瘤异柠檬酸脱氢酶（IDH）突变状态方面的性能。

材料与方法

本回顾性多中心研究纳入了 2009 年 5 月至 2021 年 11 月间已知 IDH 突变状态的连续弥漫性神经胶质瘤患者，这些患者在病理诊断前均有初始的 MR 放射学报告。共训练了 5 种 NLP 模型（长短时记忆 [LSTM]、双向 LSTM、来自转换器的双向编码器表示 [BERT]、BERT 图卷积网络 [BERT GCN]、BioBERT），通过评估受试者工作特征曲线（ROC）下面积（AUC），对内部和外部验证集中 IDH 突变状态的预测进行验证。比较了表现最佳的 NLP 模型与人类读者的性能。

结果

共纳入 1427 例患者（平均年龄±标准差，54±15 岁；779 例男性，54.6%），其中 720 例患者来自训练集，180 例患者来自内部验证集，527 例患者来自外部验证集。在外部验证集中，BERT GCN 在预测 IDH 突变状态方面表现最佳（AUC 0.85，95%CI 0.81-0.89），优于 LSTM（AUC 0.77，95%CI 0.72-0.81；p=0.003）和 BioBERT（AUC 0.81，95%CI 0.76-0.85；p=0.03）。优于神经放射科医师（AUC 0.80，95%CI 0.76-0.84；p=0.005）和神经外科医师（AUC 0.79，95%CI 0.76-0.84；p=0.04）。

结论

BERT GCN 在使用常规 MR 放射学报告预测弥漫性神经胶质瘤患者的 IDH 突变状态方面进行了外部验证，其性能优于或至少与人类读者相当。

临床相关性

自然语言处理可用于从常规放射学报告中提取相关信息，以预测癌症基因型，并提供预后信息，这可能有助于指导治疗策略并实现个性化医疗。

要点

基于转换器的自然语言处理（NLP）模型在外部验证集中对弥漫性神经胶质瘤的异柠檬酸脱氢酶突变状态进行预测，AUC 为 0.85。
最佳 NLP 模型在内部和外部验证集中均优于或至少与人类读者相当。
基于转换器的模型比传统的 NLP 模型（如长短时记忆）表现更好。

相似文献

Natural language processing to predict isocitrate dehydrogenase genotype in diffuse glioma using MR radiology reports.基于磁共振影像学报告的自然语言处理预测弥漫性脑胶质瘤异柠檬酸脱氢酶基因型

Eur Radiol. 2023 Nov;33(11):8017-8025. doi: 10.1007/s00330-023-10061-z. Epub 2023 Aug 11.

Diffusion- and perfusion-weighted MRI radiomics model may predict isocitrate dehydrogenase (IDH) mutation and tumor aggressiveness in diffuse lower grade glioma.弥散加权和灌注加权 MRI 放射组学模型可预测弥漫性低级别胶质瘤中的异柠檬酸脱氢酶 (IDH) 突变和肿瘤侵袭性。

Eur Radiol. 2020 Apr;30(4):2142-2151. doi: 10.1007/s00330-019-06548-3. Epub 2019 Dec 11.

Machine learning reveals multimodal MRI patterns predictive of isocitrate dehydrogenase and 1p/19q status in diffuse low- and high-grade gliomas.机器学习揭示了可预测弥漫性低级别和高级别神经胶质瘤异柠檬酸脱氢酶和 1p/19q 状态的多模态 MRI 模式。

J Neurooncol. 2019 Apr;142(2):299-307. doi: 10.1007/s11060-019-03096-0. Epub 2019 Jan 19.

Residual Convolutional Neural Network for the Determination of Status in Low- and High-Grade Gliomas from MR Imaging.基于残差卷积神经网络的磁共振成像对低级别和高级别脑胶质瘤状态的预测。

Clin Cancer Res. 2018 Mar 1;24(5):1073-1081. doi: 10.1158/1078-0432.CCR-17-2236. Epub 2017 Nov 22.

MR Imaging-derived Oxygen Metabolism and Neovascularization Characterization for Grading and IDH Gene Mutation Detection of Gliomas.基于 MRI 的氧代谢和新生血管特征用于脑胶质瘤分级和 IDH 基因突变检测。

Radiology. 2017 Jun;283(3):799-809. doi: 10.1148/radiol.2016161422. Epub 2016 Dec 13.

Two-Stage Training Framework Using Multicontrast MRI Radiomics for Mutation Status Prediction in Glioma.基于多对比度 MRI 放射组学的两阶段训练框架用于预测脑胶质瘤的突变状态。

Radiol Artif Intell. 2024 Jul;6(4):e230218. doi: 10.1148/ryai.230218.

Advanced imaging parameters improve the prediction of diffuse lower-grade gliomas subtype, IDH mutant with no 1p19q codeletion: added value to the T2/FLAIR mismatch sign.高级影像学参数可提高弥漫性低级别胶质瘤 IDH 突变型无 1p19q 联合缺失亚型的预测准确性：T2/FLAIR 不匹配征象的附加价值。

Eur Radiol. 2020 Feb;30(2):844-854. doi: 10.1007/s00330-019-06395-2. Epub 2019 Aug 24.

Computational Pathology for Prediction of Isocitrate Dehydrogenase Gene Mutation from Whole Slide Images in Adult Patients with Diffuse Glioma.计算病理学预测成人弥漫性胶质瘤患者异柠檬酸脱氢酶基因突变的全切片图像。

Am J Pathol. 2024 May;194(5):747-758. doi: 10.1016/j.ajpath.2024.01.009. Epub 2024 Feb 5.

Prediction of IDH genotype in gliomas with dynamic susceptibility contrast perfusion MR imaging using an explainable recurrent neural network.利用可解释的递归神经网络对动态磁敏感对比灌注磁共振成像中的胶质瘤 IDH 基因型进行预测。

Neuro Oncol. 2019 Sep 6;21(9):1197-1209. doi: 10.1093/neuonc/noz095.

Comparative Value of 2-Hydroxyglutarate-to-Lipid and Lactate Ratio versus 2-Hydroxyglutarate Concentration on MR Spectroscopic Images for Predicting Isocitrate Dehydrogenase Mutation Status in Gliomas.基于磁共振波谱成像中 2-羟戊二酸与脂质及乳酸比值与浓度对胶质瘤异柠檬酸脱氢酶突变状态的预测价值比较。

Radiol Imaging Cancer. 2020 Jul 31;2(4):e190083. doi: 10.1148/rycan.2020190083. eCollection 2020 Jul.

引用本文的文献

Impact of hospital-specific domain adaptation on BERT-based models to classify neuroradiology reports.特定医院领域适应对基于BERT模型的神经放射学报告分类的影响。

Eur Radiol. 2025 Mar 17. doi: 10.1007/s00330-025-11500-9.

本文引用的文献

Deep Learning-based Assessment of Oncologic Outcomes from Natural Language Processing of Structured Radiology Reports.基于深度学习的结构化放射学报告自然语言处理对肿瘤学结果的评估

Radiol Artif Intell. 2022 Jul 20;4(5):e220055. doi: 10.1148/ryai.220055. eCollection 2022 Sep.

Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets.多个预训练BERT模型在为大型数据集自动执行和加速数据标注方面的性能。

Radiol Artif Intell. 2022 Jun 29;4(4):e220007. doi: 10.1148/ryai.220007. eCollection 2022 Jul.

RadBERT: Adapting Transformer-based Language Models to Radiology.RadBERT：使基于Transformer的语言模型适用于放射学领域。

Radiol Artif Intell. 2022 Jun 15;4(4):e210258. doi: 10.1148/ryai.210258. eCollection 2022 Jul.

Application of a Domain-specific BERT for Detection of Speech Recognition Errors in Radiology Reports.特定领域的BERT在放射学报告语音识别错误检测中的应用。

Radiol Artif Intell. 2022 May 25;4(4):e210185. doi: 10.1148/ryai.210185. eCollection 2022 Jul.

Comparison of state-of-the-art machine and deep learning algorithms to classify proximal humeral fractures using radiology text.利用放射学文本对肱骨近端骨折进行分类的最先进机器和深度学习算法的比较。

Eur J Radiol. 2022 Aug;153:110366. doi: 10.1016/j.ejrad.2022.110366. Epub 2022 May 20.

Labeling Noncontrast Head CT Reports for Common Findings Using Natural Language Processing.基于自然语言处理的常见头部 CT 无对比剂报告标注

AJNR Am J Neuroradiol. 2022 May;43(5):721-726. doi: 10.3174/ajnr.A7500. Epub 2022 Apr 28.

Practical Guide to Natural Language Processing for Radiology.实用放射医学自然语言处理指南。

Radiographics. 2021 Sep-Oct;41(5):1446-1453. doi: 10.1148/rg.2021200113.

Qualifying Certainty in Radiology Reports through Deep Learning-Based Natural Language Processing.基于深度学习的自然语言处理在放射学报告中的定质研究。

AJNR Am J Neuroradiol. 2021 Oct;42(10):1755-1761. doi: 10.3174/ajnr.A7241. Epub 2021 Aug 19.

Patterns of Metastatic Disease in Patients with Cancer Derived from Natural Language Processing of Structured CT Radiology Reports over a 10-year Period.基于自然语言处理技术的结构化 CT 放射学报告分析 10 年间癌症患者转移病灶的模式。

Radiology. 2021 Oct;301(1):115-122. doi: 10.1148/radiol.2021210043. Epub 2021 Aug 3.

Machine learning based natural language processing of radiology reports in orthopaedic trauma.基于机器学习的放射科报告自然语言处理在骨科创伤中的应用。

Comput Methods Programs Biomed. 2021 Sep;208:106304. doi: 10.1016/j.cmpb.2021.106304. Epub 2021 Jul 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于磁共振影像学报告的自然语言处理预测弥漫性脑胶质瘤异柠檬酸脱氢酶基因型

Natural language processing to predict isocitrate dehydrogenase genotype in diffuse glioma using MR radiology reports.

机构信息

出版信息

OBJECTIVES

MATERIALS AND METHODS

RESULTS

CONCLUSION

CLINICAL RELEVANCE STATEMENT

KEY POINTS

目的

材料与方法

结果

结论

临床相关性

要点

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献