Saha Monjoy, Tran Thi-Van-Trinh, Bhawsar Praphulla Ms, Zhang Tongwu, Zhao Wei, Hoang Phuc H, Mutreja Karun, Lawrence Scott M, Rothman Nathaniel, Lan Qing, Homer Robert, Baine Marina K, Sholl Lynette M, Joubert Philippe, Leduc Charles, Travis William D, Chanock Stephen J, Shi Jianxin, Yang Soo-Ryum, Almeida Jonas S, Landi Maria Teresa
bioRxiv. 2025 Aug 20:2025.08.14.670178. doi: 10.1101/2025.08.14.670178.
Despite promising results in using deep learning to infer genetic features from histological whole-slide images (WSIs), no prior studies have specifically applied these methods to lung adenocarcinomas from subjects who have never smoked tobacco (NS-LUAD) - a molecularly and histologically distinct subset of lung cancer. Existing models have focused on LUAD from predominantly smoker populations, with limited molecular scope and variable performance. Here, we propose a customized deep convolutional neural network based on ResNet50 architecture, optimized for multilabel classification for NS-LUAD, enabling simultaneous prediction of 16 molecular alterations from a single H&E-stained WSI. Key architectural modifications included a simplified two-layer residual block without bottleneck layers, selective shortcut connections, and a sigmoid-based classification head for independent prediction of each alteration, designed to reduce computational complexity while maintaining predictive accuracy. The model was trained and evaluated on 495 WSIs from the Sherlock- study (70% training with 10% internal test set for 10-fold cross-validation, and 30% held-out validation set for final evaluation). For the held-out validation data, our model achieved high areas under the receiver operating characteristic curve [AUROC] values =0.84-0.93 for detecting 11 features: mutations, amplification, kataegis, deletion, fusion, whole-genome doubling, and hotspot mutations (p.L858R and p.E746_A750del). Performance was low to moderate for tumor mutational burden (AUROC=0.67), APOBEC mutational signature (AUROC=0.57), and hotspot mutations (p.G12C: AUROC=0.74, p.G12V: AUROC=0.55, p.G12D: AUROC=0.43). Compared to results from established architectures such as Inception-v3 on the same WSIs, our model demonstrated significantly improved performance for most features. With further optimization, our model could support triaging for molecular testing and inform precision treatment strategies for NS-LUAD patients.
尽管利用深度学习从组织学全切片图像(WSIs)推断遗传特征取得了有前景的结果,但此前尚无研究将这些方法专门应用于从不吸烟的肺癌患者(NS-LUAD)的肺腺癌——这是一种在分子和组织学上都不同的肺癌亚型。现有模型主要聚焦于以吸烟者为主的人群中的肺腺癌,分子范围有限且性能各异。在此,我们提出了一种基于ResNet50架构的定制深度卷积神经网络,针对NS-LUAD的多标签分类进行了优化,能够从单个苏木精-伊红(H&E)染色的WSI中同时预测16种分子改变。关键的架构修改包括一个简化的无瓶颈层的两层残差块、选择性捷径连接以及一个基于Sigmoid的分类头,用于对每种改变进行独立预测,旨在在保持预测准确性的同时降低计算复杂度。该模型在来自Sherlock研究的495个WSIs上进行了训练和评估(70%用于训练,10%作为内部测试集用于10折交叉验证,30%作为保留验证集用于最终评估)。对于保留验证数据,我们的模型在检测11种特征时,受试者操作特征曲线下面积(AUROC)值达到了0.84 - 0.93: 突变、 扩增、kataegis、 缺失、 融合、全基因组加倍以及 热点突变(p.L858R和p.E746_A750del)。对于肿瘤突变负担(AUROC = 0.67)、APOBEC突变特征(AUROC = 0.57)以及 热点突变(p.G12C:AUROC = 0.74,p.G12V:AUROC = 0.55,p.G12D:AUROC = 0.43),性能为低到中等。与在相同WSIs上的Inception-v3等既定架构的结果相比,我们的模型在大多数特征上表现出显著提高的性能。通过进一步优化,我们的模型可以支持分子检测的分类,并为NS-LUAD患者提供精准治疗策略的信息。