基于蛋白质组学数据，使用特征选择和贝叶斯网络识别癌症亚型。

School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China.

J Proteomics. 2023 May 30;280:104895. doi: 10.1016/j.jprot.2023.104895. Epub 2023 Apr 5.

The Cancer Proteome Atlas (TCPA) project collects reverse-phase protein arrays (RPPA)-based proteome datasets from nearly 8000 samples across 32 cancer types. This study aims to investigate the pan-cancer proteome signature and identify cancer subtypes of glioma, kidney cancer, and lung cancer based on TCPA data. We first visualized the tumor clustering models using t-distributed stochastic neighbour embedding (t-SNE) and bi-clustering heatmap. Then, three feature selection methods (pyHSICLasso, XGBoost, and Random Forest) were performed to select protein features for classifying cancer subtypes in training dataset, and the LibSVM algorithm was empolyed to test classification accuracy in the validation dataset. Clustering analysis revealed that different kinds of tumors have relatively distinct proteomic profiling based on tissue or origin. We identified 20, 10, and 20 protein features with the highest accuracies in classifying subtypes of glioma, kidney cancer, and lung cancer, respectively. The predictive abilities of the selected proteins were confirmed by receiving operating characteristic (ROC) analysis. Finally, the Bayesian network was utilized to explore the protein biomarkers that have direct causal relationships with cancer subtypes. Overall, we highlight the theoretical and technical applications of machine learning based feature selection approaches in the analysis of high-throughput biological data, particularly for cancer biomarker research. SIGNIFICANCE: Functional proteomics is a powerful approach for characterizing cell signaling pathways and understanding their phenotypic effects on cancer development. The TCPA database provides a platform to explore and analyze TCGA pan-cancer RPPA-based protein expression. With the advent of the RPPA technology, the availability of high-throughput data in TCPA platform has made it possible to use machine learning methods to identify protein biomarkers and further differentiate subtypes of cancer based on proteomic data. In this study, we highlight the role of feature selection and Bayesian network in discovery protein biomarker for classifying cancer subtypes based on functional proteomic data. The application of machine learning methods in the analysis of high-throughput biological data, particularly for cancer biomarker researches, which have potential clinical values in developing individualized treatment strategies.

癌症蛋白质组图谱 (TCPA) 项目收集了来自 32 种癌症类型近 8000 个样本的基于反相蛋白质阵列 (RPPA) 的蛋白质组数据集。本研究旨在基于 TCPA 数据调查泛癌症蛋白质组特征，并鉴定脑癌、肾癌和肺癌的癌症亚型。我们首先使用 t 分布随机邻域嵌入 (t-SNE) 和双聚类热图可视化肿瘤聚类模型。然后，使用三种特征选择方法（pyHSICLasso、XGBoost 和随机森林）在训练数据集上选择用于分类癌症亚型的蛋白质特征，并用 LibSVM 算法在验证数据集上测试分类准确性。聚类分析表明，不同种类的肿瘤根据组织或起源具有相对独特的蛋白质组特征。我们分别确定了 20、10 和 20 种具有最高分类准确率的蛋白质特征，用于分类脑癌、肾癌和肺癌的亚型。通过接收者操作特征 (ROC) 分析验证了所选蛋白质的预测能力。最后，利用贝叶斯网络探索与癌症亚型具有直接因果关系的蛋白质生物标志物。总的来说，我们强调了基于机器学习的特征选择方法在分析高通量生物数据中的理论和技术应用，特别是在癌症生物标志物研究中。

意义

功能蛋白质组学是一种强大的方法，用于描述细胞信号通路并了解它们对癌症发展的表型影响。TCPA 数据库提供了一个平台，用于探索和分析 TCGA 基于 RPPA 的泛癌症蛋白质表达。随着 RPPA 技术的出现，TCPA 平台中高通量数据的可用性使得可以使用机器学习方法来识别蛋白质生物标志物，并进一步根据蛋白质组数据区分癌症亚型。在本研究中，我们强调了特征选择和贝叶斯网络在发现基于功能蛋白质组学数据分类癌症亚型的蛋白质生物标志物中的作用。机器学习方法在分析高通量生物数据中的应用，特别是在癌症生物标志物研究中，在开发个体化治疗策略方面具有潜在的临床价值。

相似文献

Using feature selection and Bayesian network identify cancer subtypes based on proteomic data.

J Proteomics. 2023 May 30;280:104895. doi: 10.1016/j.jprot.2023.104895. Epub 2023 Apr 5.

The Weight-Based Feature Selection (WBFS) Algorithm Classifies Lung Cancer Subtypes Using Proteomic Data.

Entropy (Basel). 2023 Jun 29;25(7):1003. doi: 10.3390/e25071003.

TCPA v3.0: An Integrative Platform to Explore the Pan-Cancer Analysis of Functional Proteomic Data.

Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S15-S25. doi: 10.1074/mcp.RA118.001260. Epub 2019 Jun 14.

High-throughput proteomics of breast cancer subtypes: Biological characterization and multiple candidate biomarker panels to patients' stratification.

J Proteomics. 2023 Aug 15;285:104955. doi: 10.1016/j.jprot.2023.104955. Epub 2023 Jun 28.

Functional Proteomic Profiling Analysis in Four Major Types of Gastrointestinal Cancers.

Biomolecules. 2023 Apr 20;13(4):701. doi: 10.3390/biom13040701.

Explore, Visualize, and Analyze Functional Cancer Proteomic Data Using the Cancer Proteome Atlas.

Cancer Res. 2017 Nov 1;77(21):e51-e54. doi: 10.1158/0008-5472.CAN-17-0369.

Bayesian data integration and variable selection for pan-cancer survival prediction using protein expression data.

Biometrics. 2020 Mar;76(1):316-325. doi: 10.1111/biom.13132. Epub 2019 Oct 3.

Identification of protein signatures for lung cancer subtypes based on BPSO method.

PLoS One. 2023 Dec 7;18(12):e0294243. doi: 10.1371/journal.pone.0294243. eCollection 2023.

Evaluation of reverse phase protein array (RPPA)-based pathway-activation profiling in 84 non-small cell lung cancer (NSCLC) cell lines as platform for cancer proteomics and biomarker discovery.

Biochim Biophys Acta. 2014 May;1844(5):950-9. doi: 10.1016/j.bbapap.2013.11.017. Epub 2013 Dec 19.

Proteomic Features of Colorectal Cancer Identify Tumor Subtypes Independent of Oncogenic Mutations and Independently Predict Relapse-Free Survival.

Ann Surg Oncol. 2017 Dec;24(13):4051-4058. doi: 10.1245/s10434-017-6054-5. Epub 2017 Sep 21.

引用本文的文献

Navigating the microarray landscape: a comprehensive review of feature selection techniques and their applications.

Front Big Data. 2025 Jul 10;8:1624507. doi: 10.3389/fdata.2025.1624507. eCollection 2025.

Acute Myeloid Leukemia Genome Characterization Study and Subtype Classification Employing Feature Selection and Bayesian Networks.

Biomedicines. 2025 Apr 28;13(5):1067. doi: 10.3390/biomedicines13051067.

Utilizing Feature Selection Techniques for AI-Driven Tumor Subtype Classification: Enhancing Precision in Cancer Diagnostics.

Biomolecules. 2025 Jan 8;15(1):81. doi: 10.3390/biom15010081.

Integrating proteomics and explainable artificial intelligence: a comprehensive analysis of protein biomarkers for endometrial cancer diagnosis and prognosis.

Front Mol Biosci. 2024 Jun 3;11:1389325. doi: 10.3389/fmolb.2024.1389325. eCollection 2024.

Identification of protein signatures for lung cancer subtypes based on BPSO method.

PLoS One. 2023 Dec 7;18(12):e0294243. doi: 10.1371/journal.pone.0294243. eCollection 2023.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Using feature selection and Bayesian network identify cancer subtypes based on proteomic data.

J Proteomics. 2023 May 30;280:104895. doi: 10.1016/j.jprot.2023.104895. Epub 2023 Apr 5.

The Weight-Based Feature Selection (WBFS) Algorithm Classifies Lung Cancer Subtypes Using Proteomic Data.

Entropy (Basel). 2023 Jun 29;25(7):1003. doi: 10.3390/e25071003.

TCPA v3.0: An Integrative Platform to Explore the Pan-Cancer Analysis of Functional Proteomic Data.

Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S15-S25. doi: 10.1074/mcp.RA118.001260. Epub 2019 Jun 14.

High-throughput proteomics of breast cancer subtypes: Biological characterization and multiple candidate biomarker panels to patients' stratification.

J Proteomics. 2023 Aug 15;285:104955. doi: 10.1016/j.jprot.2023.104955. Epub 2023 Jun 28.

Functional Proteomic Profiling Analysis in Four Major Types of Gastrointestinal Cancers.

Biomolecules. 2023 Apr 20;13(4):701. doi: 10.3390/biom13040701.

Explore, Visualize, and Analyze Functional Cancer Proteomic Data Using the Cancer Proteome Atlas.

Cancer Res. 2017 Nov 1;77(21):e51-e54. doi: 10.1158/0008-5472.CAN-17-0369.

Bayesian data integration and variable selection for pan-cancer survival prediction using protein expression data.

Biometrics. 2020 Mar;76(1):316-325. doi: 10.1111/biom.13132. Epub 2019 Oct 3.

Identification of protein signatures for lung cancer subtypes based on BPSO method.

PLoS One. 2023 Dec 7;18(12):e0294243. doi: 10.1371/journal.pone.0294243. eCollection 2023.

Evaluation of reverse phase protein array (RPPA)-based pathway-activation profiling in 84 non-small cell lung cancer (NSCLC) cell lines as platform for cancer proteomics and biomarker discovery.

Biochim Biophys Acta. 2014 May;1844(5):950-9. doi: 10.1016/j.bbapap.2013.11.017. Epub 2013 Dec 19.

Proteomic Features of Colorectal Cancer Identify Tumor Subtypes Independent of Oncogenic Mutations and Independently Predict Relapse-Free Survival.

Ann Surg Oncol. 2017 Dec;24(13):4051-4058. doi: 10.1245/s10434-017-6054-5. Epub 2017 Sep 21.

引用本文的文献

Navigating the microarray landscape: a comprehensive review of feature selection techniques and their applications.

Front Big Data. 2025 Jul 10;8:1624507. doi: 10.3389/fdata.2025.1624507. eCollection 2025.

Acute Myeloid Leukemia Genome Characterization Study and Subtype Classification Employing Feature Selection and Bayesian Networks.

Biomedicines. 2025 Apr 28;13(5):1067. doi: 10.3390/biomedicines13051067.

Utilizing Feature Selection Techniques for AI-Driven Tumor Subtype Classification: Enhancing Precision in Cancer Diagnostics.

Biomolecules. 2025 Jan 8;15(1):81. doi: 10.3390/biom15010081.

Integrating proteomics and explainable artificial intelligence: a comprehensive analysis of protein biomarkers for endometrial cancer diagnosis and prognosis.

Front Mol Biosci. 2024 Jun 3;11:1389325. doi: 10.3389/fmolb.2024.1389325. eCollection 2024.

Identification of protein signatures for lung cancer subtypes based on BPSO method.

PLoS One. 2023 Dec 7;18(12):e0294243. doi: 10.1371/journal.pone.0294243. eCollection 2023.

Using feature selection and Bayesian network identify cancer subtypes based on proteomic data.

机构信息

出版信息

意义

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献