利用组织学图像进行肺癌基因突变预测的深度学习：一项多中心回顾性研究。

Deep learning using histological images for gene mutation prediction in lung cancer: a multicentre retrospective study.

作者信息

Zhao Yu, Xiong Shan, Ren Qin, Wang Jun, Li Min, Yang Lin, Wu Di, Tang Kejing, Pan Xiaojie, Chen Fengxia, Wang Wenxiang, Jin Shi, Liu Xianling, Lin Gen, Yao Wenxiu, Cai Linbo, Yang Yi, Liu Jixian, Wu Jingxun, Fu Wenfan, Sun Kai, Li Feng, Cheng Bo, Zhan Shuting, Wang Haixuan, Yu Ziwen, Liu Xiwen, Zhong Ran, Wang Huiting, He Ping, Zheng Yongmei, Liang Peng, Chen Longfei, Hou Ting, Huang Junzhou, He Bing, Song Jiangning, Wu Lin, Hu Chengping, He Jianxing, Yao Jianhua, Liang Wenhua

机构信息

Department of Thoracic Oncology and Surgery, The First Affiliated Hospital of Guangzhou Medical University, State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Disease, Guangzhou, China; AI Lab, Tencent, Shenzhen, China.

出版信息

Lancet Oncol. 2025 Jan;26(1):136-146. doi: 10.1016/S1470-2045(24)00599-0. Epub 2024 Dec 6.

DOI:10.1016/S1470-2045(24)00599-0

PMID:39653054

Abstract

BACKGROUND

Accurate detection of driver gene mutations is crucial for treatment planning and predicting prognosis for patients with lung cancer. Conventional genomic testing requires high-quality tissue samples and is time-consuming and resource-consuming, and as a result, is not available for most patients, especially those in low-resource settings. We aimed to develop an annotation-free Deep learning-enabled artificial intelligence method to predict GEne Mutations (DeepGEM) from routinely acquired histological slides.

METHODS

In this multicentre retrospective study, we collected data for patients with lung cancer who had a biopsy and multigene next-generation sequencing done at 16 hospitals in China (with no restrictions on age, sex, or histology type), to form a large multicentre dataset comprising paired pathological image and multiple gene mutation information. We also included patients from The Cancer Genome Atlas (TCGA) publicly available dataset. Our developed model is an instance-level and bag-level co-supervised multiple instance learning method with label disambiguation design. We trained and initially tested the DeepGEM model on the internal dataset (patients from the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China), and further evaluated it on the external dataset (patients from the remaining 15 centres) and the public TCGA dataset. Additionally, a dataset of patients from the same medical centre as the internal dataset, but without overlap, was used to evaluate the model's generalisation ability to biopsy samples from lymph node metastases. The primary objective was the performance of the DeepGEM model in predicting gene mutations (area under the curve [AUC] and accuracy) in the four prespecified groups (ie, the hold-out internal test set, multicentre external test set, TCGA set, and lymph node metastases set).

FINDINGS

Assessable pathological images and multigene testing information were available for 3697 patients who had biopsy and multigene next-generation sequencing done between Jan 1, 2018, and March 31, 2022, at the 16 centres. We excluded 60 patients with low-quality images. We included 3767 images from 3637 consecutive patients (1978 [54·4%] men, 1514 [41·6%] women, 145 [4·0%] unknown; median age 60 years [IQR 52-67]), with 1716 patients in the internal dataset, 1718 patients in the external dataset, and 203 patients in the lymph node metastases dataset. The DeepGEM model showed robust performance in the internal dataset: for excisional biopsy samples, AUC values for gene mutation prediction ranged from 0·90 (95% CI 0·77-1·00) to 0·97 (0·93-1·00) and accuracy values ranged from 0·91 (0·85-0·98) to 0·97 (0·93-1·00); for aspiration biopsy samples, AUC values ranged from 0·85 (0·80-0·91) to 0·95 (0·86-1·00) and accuracy values ranged from 0·79 (0·74-0·85) to 0·99 (0·98-1·00). In the multicentre external dataset, for excisional biopsy samples, AUC values ranged from 0·80 (95% CI 0·75-0·85) to 0·91 (0·88-1·00) and accuracy values ranged from 0·79 (0·76-0·82) to 0·95 (0·93-0·96); for aspiration biopsy samples, AUC values ranged from 0·76 (0·70-0·83) to 0·87 (0·80-0·94) and accuracy values ranged from 0·76 (0·74-0·79) to 0·97 (0·96-0·98). The model also showed strong performance on the TCGA dataset (473 patients; 535 slides; AUC values ranged from 0·82 [95% CI 0·71-0·93] to 0·96 [0·91-1·00], accuracy values ranged from 0·79 [0·70-0·88] to 0·95 [0·90-1·00]). The DeepGEM model, trained on primary region biopsy samples, could be generalised to biopsy samples from lymph node metastases, with AUC values of 0·91 (95% CI 0·88-0·94) for EGFR and 0·88 (0·82-0·93) for KRAS and accuracy values of 0·85 (0·80-0·88) for EGFR and 0·95 (0·92-0·96) for KRAS and showed potential for prognostic prediction of targeted therapy. The model generated spatial gene mutation maps, indicating gene mutation spatial distribution.

INTERPRETATION

We developed an AI-based method that can provide an accurate, timely, and economical prediction of gene mutation and mutation spatial distribution. The method showed substantial potential as an assistive tool for guiding the clinical treatment of patients with lung cancer.

FUNDING

National Natural Science Foundation of China, the Science and Technology Planning Project of Guangzhou, and the National Key Research and Development Program of China.

TRANSLATION

For the Chinese translation of the abstract see Supplementary Materials section.

摘要

背景

准确检测驱动基因突变对于肺癌患者的治疗规划和预后预测至关重要。传统的基因组检测需要高质量的组织样本，且耗时耗力，因此大多数患者无法进行，尤其是资源匮乏地区的患者。我们旨在开发一种无需注释的深度学习人工智能方法，从常规获取的组织学切片中预测基因突变（DeepGEM）。

方法

在这项多中心回顾性研究中，我们收集了在中国16家医院接受活检和多基因下一代测序的肺癌患者的数据（对年龄、性别或组织学类型无限制），以形成一个包含配对病理图像和多基因突变信息的大型多中心数据集。我们还纳入了来自癌症基因组图谱（TCGA）公开可用数据集的患者。我们开发的模型是一种具有标签消歧设计的实例级和包级联合监督多实例学习方法。我们在内部数据集（来自中国广州医科大学附属第一医院的患者）上训练并初步测试了DeepGEM模型，并在外部数据集（来自其余15个中心的患者）和公共TCGA数据集上进一步评估了该模型。此外，使用来自与内部数据集相同医疗中心但无重叠的患者数据集来评估模型对淋巴结转移活检样本的泛化能力。主要目标是DeepGEM模型在四个预先指定组（即留出的内部测试集、多中心外部测试集、TCGA集和淋巴结转移集）中预测基因突变的性能（曲线下面积[AUC]和准确性）。

结果

在16个中心，共有3697例在2018年1月1日至2022年3月31日期间接受活检和多基因下一代测序的患者可获得可评估的病理图像和多基因检测信息。我们排除了60例图像质量低的患者。我们纳入了来自3637例连续患者的3767张图像（1978例[54.4%]男性，1514例[41.6%]女性，145例[4.0%]未知；中位年龄60岁[IQR 52 - 67]），其中内部数据集有1716例患者，外部数据集有1718例患者，淋巴结转移数据集有203例患者。DeepGEM模型在内部数据集上表现出强大的性能：对于切除活检样本，基因突变预测的AUC值范围为0.90（95%CI 0.77 - 1.00）至0.97（0.93 - 1.00），准确性值范围为0.91（0.85 - 0.98）至0.97（0.93 - 1.00）；对于穿刺活检样本，AUC值范围为0.85（0.80 - 0.91）至0.95（0.86 - 1.00），准确性值范围为0.79（0.74 - 0.85）至0.99（0.98 - 1.00）。在多中心外部数据集中，对于切除活检样本，AUC值范围为0.80（95%CI 0.75 - 0.85）至0.91（0.88 - 1.00），准确性值范围为0.79（0.76 - 0.82）至0.95（0.93 - 0.96）；对于穿刺活检样本，AUC值范围为0.76（0.70 - 0.83）至0.87（0.80 - 0.94），准确性值范围为0.76（0.74 - 0.79）至0.97（0.96 - 0.98）。该模型在TCGA数据集（473例患者；535张切片）上也表现出强大的性能（AUC值范围为0.82[95%CI 0.71 - 0.93]至0.96[0.91 - 1.00]，准确性值范围为0.79[0.70 - 0.88]至0.95[0.90 - 1.00]）。在原发区域活检样本上训练的DeepGEM模型可以推广到淋巴结转移的活检样本，EGFR的AUC值为0.91（95%CI 0.88 - 0.94），KRAS的AUC值为0.88（0.82 - 0.93），EGFR的准确性值为0.85（0.80 - 0.88），KRAS的准确性值为0.95（0.92 - 0.96），显示出靶向治疗预后预测的潜力。该模型生成了空间基因突变图谱，表明了基因突变的空间分布。

解读

我们开发了一种基于人工智能的方法，该方法可以准确、及时且经济地预测基因突变和突变空间分布。该方法作为指导肺癌患者临床治疗的辅助工具具有巨大潜力。

资金来源

中国国家自然科学基金、广州科技计划项目和中国国家重点研发计划。

中文翻译摘要见补充材料部分。

相似文献

Deep learning using histological images for gene mutation prediction in lung cancer: a multicentre retrospective study.利用组织学图像进行肺癌基因突变预测的深度学习：一项多中心回顾性研究。

Lancet Oncol. 2025 Jan;26(1):136-146. doi: 10.1016/S1470-2045(24)00599-0. Epub 2024 Dec 6.

Artificial intelligence-based model for lymph node metastases detection on whole slide images in bladder cancer: a retrospective, multicentre, diagnostic study.基于人工智能的膀胱癌全切片图像淋巴结转移检测模型：一项回顾性、多中心诊断研究

Lancet Oncol. 2023 Apr;24(4):360-370. doi: 10.1016/S1470-2045(23)00061-X. Epub 2023 Mar 6.

Deep learning models for thyroid nodules diagnosis of fine-needle aspiration biopsy: a retrospective, prospective, multicentre study in China.深度学习模型在甲状腺结节细针穿刺活检诊断中的应用：一项在中国进行的回顾性、前瞻性、多中心研究。

Lancet Digit Health. 2024 Jul;6(7):e458-e469. doi: 10.1016/S2589-7500(24)00085-2. Epub 2024 Jun 6.

Artificial intelligence-based models enabling accurate diagnosis of ovarian cancer using laboratory tests in China: a multicentre, retrospective cohort study.基于人工智能的模型利用实验室检测在中国实现卵巢癌的准确诊断：一项多中心、回顾性队列研究。

Lancet Digit Health. 2024 Mar;6(3):e176-e186. doi: 10.1016/S2589-7500(23)00245-5. Epub 2024 Jan 11.

Radiographical assessment of tumour stroma and treatment outcomes using deep learning: a retrospective, multicohort study.利用深度学习对肿瘤基质进行放射学评估和治疗结果：一项回顾性、多队列研究。

Lancet Digit Health. 2021 Jun;3(6):e371-e382. doi: 10.1016/S2589-7500(21)00065-0.

Predictive value of single-nucleotide polymorphism signature for recurrence in localised renal cell carcinoma: a retrospective analysis and multicentre validation study.单核苷酸多态性特征预测局限性肾细胞癌复发的价值：一项回顾性分析和多中心验证研究。

Lancet Oncol. 2019 Apr;20(4):591-600. doi: 10.1016/S1470-2045(18)30932-X. Epub 2019 Mar 14.

Deep learning-based model for prediction of early recurrence and therapy response on whole slide images in non-muscle-invasive bladder cancer: a retrospective, multicentre study.基于深度学习的非肌层浸润性膀胱癌全切片图像早期复发及治疗反应预测模型：一项回顾性多中心研究

EClinicalMedicine. 2025 Feb 26;81:103125. doi: 10.1016/j.eclinm.2025.103125. eCollection 2025 Mar.

Highly sensitive detection platform-based diagnosis of oesophageal squamous cell carcinoma in China: a multicentre, case-control, diagnostic study.基于高灵敏度检测平台的中国食管鳞状细胞癌诊断：一项多中心、病例对照诊断研究。

Lancet Digit Health. 2024 Oct;6(10):e705-e717. doi: 10.1016/S2589-7500(24)00153-5.

Multiregional dynamic contrast-enhanced MRI-based integrated system for predicting pathological complete response of axillary lymph node to neoadjuvant chemotherapy in breast cancer: multicentre study.多区域动态对比增强 MRI 为基础的预测乳腺癌腋窝淋巴结新辅助化疗病理完全缓解的集成系统：多中心研究。

EBioMedicine. 2024 Sep;107:105311. doi: 10.1016/j.ebiom.2024.105311. Epub 2024 Aug 26.

An artificial intelligence model for detecting pathological lymph node metastasis in prostate cancer using whole slide images: a retrospective, multicentre, diagnostic study.一种使用全切片图像检测前列腺癌病理性淋巴结转移的人工智能模型：一项回顾性、多中心诊断研究。

EClinicalMedicine. 2024 Apr 5;71:102580. doi: 10.1016/j.eclinm.2024.102580. eCollection 2024 May.

引用本文的文献

Genomic Characterization of Lung Cancer in Never-Smokers Using Deep Learning.利用深度学习对从不吸烟者肺癌进行基因组特征分析。

bioRxiv. 2025 Aug 20:2025.08.14.670178. doi: 10.1101/2025.08.14.670178.

Predicting ROS1 and ALK fusions in NSCLC from H&E slides with a two-step vision transformer approach.采用两步视觉变换器方法从苏木精-伊红（H&E）染色切片预测非小细胞肺癌中的ROS1和ALK融合。

NPJ Precis Oncol. 2025 Jul 30;9(1):266. doi: 10.1038/s41698-025-01037-x.

Machine learning approaches for EGFR mutation status prediction in NSCLC: an updated systematic review.用于非小细胞肺癌中表皮生长因子受体突变状态预测的机器学习方法：一项更新的系统评价

Front Oncol. 2025 Jul 10;15:1576461. doi: 10.3389/fonc.2025.1576461. eCollection 2025.

Deep learning in histopathology images for prediction of oncogenic driver molecular alterations in lung cancer: a systematic review and meta-analysis.用于预测肺癌致癌驱动分子改变的组织病理学图像深度学习：系统评价与荟萃分析

Transl Lung Cancer Res. 2025 May 30;14(5):1756-1769. doi: 10.21037/tlcr-2024-1196. Epub 2025 May 21.

Development and validation of machine learning models based on molecular features for estimating the probability of multiple primary lung carcinoma versus intrapulmonary metastasis in patients presenting multiple non-small cell lung cancers.基于分子特征的机器学习模型的开发与验证，用于估计患有多个非小细胞肺癌的患者发生多原发性肺癌与肺内转移的概率。

Transl Lung Cancer Res. 2025 Apr 30;14(4):1118-1137. doi: 10.21037/tlcr-24-875. Epub 2025 Apr 25.

Artificial Intelligence in Thoracic Surgery: A Review Bridging Innovation and Clinical Practice for the Next Generation of Surgical Care.胸外科中的人工智能：一篇将创新与下一代外科护理临床实践相联系的综述

J Clin Med. 2025 Apr 16;14(8):2729. doi: 10.3390/jcm14082729.

A bibliometric analysis of artificial intelligence applied to cervical cancer.人工智能应用于宫颈癌的文献计量分析

Front Med (Lausanne). 2025 Apr 8;12:1562818. doi: 10.3389/fmed.2025.1562818. eCollection 2025.

AI accurately identifies targetable alterations in lung cancer histological images.人工智能可准确识别肺癌组织学图像中的可靶向改变。

Nat Rev Clin Oncol. 2025 Apr;22(4):239-240. doi: 10.1038/s41571-025-00999-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用组织学图像进行肺癌基因突变预测的深度学习：一项多中心回顾性研究。

Deep learning using histological images for gene mutation prediction in lung cancer: a multicentre retrospective study.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

FINDINGS

INTERPRETATION

FUNDING

TRANSLATION

背景

方法

结果

解读

资金来源

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献