Wang Xue, Sarangi Vivekananda, Wickland Daniel P, Li Shaoyu, Chen Duan, Aubrey Thompson E, Jenkinson Garrett, Asmann Yan W
Department of Quantitative Health Sciences, Mayo Clinic, 4500 San Pablo Rd. S., Jacksonville, FL, USA, 32224.
Department of Quantitative Health Sciences, Mayo Clinic, 200 1st St SW, Rochester, MN, USA, 55905.
Expert Syst Appl. 2025 Mar 1;262. doi: 10.1016/j.eswa.2024.125632. Epub 2024 Nov 4.
Artificial neural networks have recently gained significant attention in biomedical research. However, their utility in survival analysis still faces many challenges. In addition to designing models for high accuracy, it is essential to optimize models that provide biologically meaningful insights. With these considerations in mind, we developed a deep neural network model, MaskedNet, to identify genes and pathways whose expression at the time of diagnosis is associated with overall survival. MaskedNet was trained using TCGA breast cancer transcriptome and clinical data, and the model's final output was the predicted logarithm of the hazard ratio for death. The trained model was interpreted using SHapley Additive exPlanations (SHAP), a technique grounded in robust mathematical principles that assigns importance scores to input features. Compared to traditional Cox proportional hazards regression, MaskedNet had higher accuracy, as measured by Harrell's C-index. We also found that aggregating outputs from several model runs identified multiple genes and pathways associated with overall survival, including and genes, along with their related pathways. To further elucidate the role of the gene, tumors were partitioned into two groups based on low and high SHAP values, respectively. Tumors with lower SHAP values exhibited higher expression and better overall survival, which were linked to more abundant presence of M1 macrophages and activated CD4+ and CD8+ T cells in the tumor microenvironment. The association of the pathway with overall survival was validated in the trastuzumab arm of the NCCTG-N9831 trial, an independent breast cancer study.
人工神经网络最近在生物医学研究中受到了广泛关注。然而,它们在生存分析中的应用仍面临许多挑战。除了设计高精度模型外,优化能够提供生物学意义深刻见解的模型也至关重要。考虑到这些因素,我们开发了一种深度神经网络模型MaskedNet,以识别那些在诊断时的表达与总生存期相关的基因和通路。MaskedNet使用TCGA乳腺癌转录组和临床数据进行训练,模型的最终输出是死亡风险比的预测对数。使用基于稳健数学原理的Shapley值加法解释(SHAP)技术对训练好的模型进行解释,该技术为输入特征赋予重要性分数。与传统的Cox比例风险回归相比,MaskedNet具有更高的准确性,以Harrell's C指数衡量。我们还发现,汇总多次模型运行的输出可识别出多个与总生存期相关的基因和通路,包括 基因和 基因及其相关通路。为了进一步阐明 基因的作用,肿瘤分别根据低和高SHAP值分为两组。SHAP值较低的肿瘤表现出较高的 表达和更好的总生存期,这与肿瘤微环境中更丰富的M1巨噬细胞以及活化的CD4+和CD8+ T细胞的存在有关。在独立的乳腺癌研究NCCTG-N9831试验的曲妥珠单抗治疗组中验证了 通路与总生存期的关联。