• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用图卷积神经网络和逐层相关性传播进行稳定特征选择,以发现乳腺癌的生物标志物。

Stable feature selection utilizing Graph Convolutional Neural Network and Layer-wise Relevance Propagation for biomarker discovery in breast cancer.

机构信息

Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany.

Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany; Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, Göttingen, 37073, Germany; Scientific Core Facility Medical Biometry and Statistical Bioinformatics, University Medical Center Göttingen, Humboldtallee 32, Göttingen, 37073, Germany.

出版信息

Artif Intell Med. 2024 May;151:102840. doi: 10.1016/j.artmed.2024.102840. Epub 2024 Mar 11.

DOI:10.1016/j.artmed.2024.102840
PMID:38658129
Abstract

High-throughput technologies are becoming increasingly important in discovering prognostic biomarkers and in identifying novel drug targets. With Mammaprint, Oncotype DX, and many other prognostic molecular signatures breast cancer is one of the paradigmatic examples of the utility of high-throughput data to deliver prognostic biomarkers, that can be represented in a form of a rather short gene list. Such gene lists can be obtained as a set of features (genes) that are important for the decisions of a Machine Learning (ML) method applied to high-dimensional gene expression data. Several studies have identified predictive gene lists for patient prognosis in breast cancer, but these lists are unstable and have only a few genes in common. Instability of feature selection impedes biological interpretability: genes that are relevant for cancer pathology should be members of any predictive gene list obtained for the same clinical type of patients. Stability and interpretability of selected features can be improved by including information on molecular networks in ML methods. Graph Convolutional Neural Network (GCNN) is a contemporary deep learning approach applicable to gene expression data structured by a prior knowledge molecular network. Layer-wise Relevance Propagation (LRP) and SHapley Additive exPlanations (SHAP) are methods to explain individual decisions of deep learning models. We used both GCNN+LRP and GCNN+SHAP techniques to construct feature sets by aggregating individual explanations. We suggest a methodology to systematically and quantitatively analyze the stability, the impact on the classification performance, and the interpretability of the selected feature sets. We used this methodology to compare GCNN+LRP to GCNN+SHAP and to more classical ML-based feature selection approaches. Utilizing a large breast cancer gene expression dataset we show that, while feature selection with SHAP is useful in applications where selected features have to be impactful for classification performance, among all studied methods GCNN+LRP delivers the most stable (reproducible) and interpretable gene lists.

摘要

高通量技术在发现预后生物标志物和鉴定新的药物靶点方面变得越来越重要。Mammaprint、Oncotype DX 和许多其他预后分子特征就是将高通量数据用于提供预后生物标志物的典范例子之一,这些生物标志物可以用一个相当短的基因列表来表示。这样的基因列表可以作为机器学习 (ML) 方法应用于高维基因表达数据的决策的一组重要特征(基因)来获得。已有多项研究确定了乳腺癌患者预后的预测基因列表,但这些列表不稳定,且只有少数基因相同。特征选择的不稳定性妨碍了生物学可解释性:对于癌症病理学相关的基因应该是为相同临床类型的患者获得的任何预测基因列表的成员。通过在 ML 方法中纳入有关分子网络的信息,可以提高所选特征的稳定性和可解释性。图卷积神经网络 (GCNN) 是一种适用于基于先验知识分子网络构建的基因表达数据的现代深度学习方法。逐层相关性传播 (LRP) 和 Shapley 可加性解释 (SHAP) 是解释深度学习模型个别决策的方法。我们使用 GCNN+LRP 和 GCNN+SHAP 技术通过聚合个体解释来构建特征集。我们提出了一种系统且定量分析所选特征集的稳定性、对分类性能的影响和可解释性的方法。我们使用该方法将 GCNN+LRP 与 GCNN+SHAP 以及更经典的基于 ML 的特征选择方法进行了比较。利用大型乳腺癌基因表达数据集,我们表明,虽然 SHAP 的特征选择在所选特征对分类性能有影响的应用中很有用,但在所有研究方法中,GCNN+LRP 提供了最稳定(可重现)和可解释的基因列表。

相似文献

1
Stable feature selection utilizing Graph Convolutional Neural Network and Layer-wise Relevance Propagation for biomarker discovery in breast cancer.利用图卷积神经网络和逐层相关性传播进行稳定特征选择,以发现乳腺癌的生物标志物。
Artif Intell Med. 2024 May;151:102840. doi: 10.1016/j.artmed.2024.102840. Epub 2024 Mar 11.
2
Explaining decisions of graph convolutional neural networks: patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer.解释图卷积神经网络决策:乳腺癌转移预测中与患者特异性相关的分子子网络。
Genome Med. 2021 Mar 11;13(1):42. doi: 10.1186/s13073-021-00845-7.
3
Prediction and interpretation of cancer survival using graph convolution neural networks.基于图卷积神经网络的癌症生存预测和解释。
Methods. 2021 Aug;192:120-130. doi: 10.1016/j.ymeth.2021.01.004. Epub 2021 Jan 21.
4
Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.使用混合特征选择方法和深度学习架构增强从基因表达谱预测浸润性导管癌乳腺癌分期的能力。
Med Biol Eng Comput. 2023 Nov;61(11):2895-2919. doi: 10.1007/s11517-023-02892-1. Epub 2023 Aug 2.
5
Network-based drug sensitivity prediction.基于网络的药物敏感性预测。
BMC Med Genomics. 2020 Dec 28;13(Suppl 11):193. doi: 10.1186/s12920-020-00829-3.
6
Integrating ensemble systems biology feature selection and bimodal deep neural network for breast cancer prognosis prediction.集成集成系统生物学特征选择和双模态深度神经网络用于乳腺癌预后预测。
Sci Rep. 2021 Jul 21;11(1):14914. doi: 10.1038/s41598-021-92864-y.
7
NNBGWO-BRCA marker: Neural Network and binary grey wolf optimization based Breast cancer biomarker discovery framework using multi-omics dataset.基于神经网络和二进制灰狼优化的乳腺癌生物标志物发现框架,利用多组学数据集。
Comput Methods Programs Biomed. 2024 Sep;254:108291. doi: 10.1016/j.cmpb.2024.108291. Epub 2024 Jun 18.
8
Data Integration Using Advances in Machine Learning in Drug Discovery and Molecular Biology.利用机器学习进展进行药物发现和分子生物学中的数据整合
Methods Mol Biol. 2021;2190:167-184. doi: 10.1007/978-1-0716-0826-5_7.
9
Automated biomarker candidate discovery in imaging mass spectrometry data through spatially localized Shapley additive explanations.通过空间局部化 Shapley 加性解释实现成像质谱数据中的自动生物标志物候选物发现。
Anal Chim Acta. 2021 Sep 8;1177:338522. doi: 10.1016/j.aca.2021.338522. Epub 2021 Apr 26.
10
Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods.通过多种特征选择方法从高通量数据中发现用于肝细胞癌的稳健生物标志物。
BMC Med Genomics. 2021 Aug 25;14(Suppl 1):112. doi: 10.1186/s12920-021-00957-4.

引用本文的文献

1
Ultrasound derived deep learning features for predicting axillary lymph node metastasis in breast cancer using graph convolutional networks in a multicenter study.在一项多中心研究中,利用图卷积网络从超声图像中提取深度学习特征以预测乳腺癌腋窝淋巴结转移。
Sci Rep. 2025 Jul 30;15(1):27796. doi: 10.1038/s41598-025-13086-0.
2
A comparative analysis of three graph neural network models for predicting axillary lymph node metastasis in early-stage breast cancer.三种用于预测早期乳腺癌腋窝淋巴结转移的图神经网络模型的比较分析。
Sci Rep. 2025 Apr 22;15(1):13918. doi: 10.1038/s41598-025-97257-z.
3
Ensemble-GNN: federated ensemble learning with graph neural networks for disease module discovery and classification.
基于图神经网络的联邦集成学习算法:用于疾病模块发现和分类。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad703.