Park Jeong-Woon, Rhee Je-Keun
Department of Bioinformatics & Life Science, Soongsil University, Seoul 06987, Republic of Korea.
Biology (Basel). 2024 Oct 7;13(10):799. doi: 10.3390/biology13100799.
Breast cancer is a heterogeneous disease composed of various biologically distinct subtypes, each characterized by unique molecular features. Its formation and progression involve a complex, multistep process that includes the accumulation of numerous genetic and epigenetic alterations. Although integrating RNA-seq transcriptome data with ATAC-seq epigenetic information provides a more comprehensive understanding of gene regulation and its impact across different conditions, no classification model has yet been developed for breast cancer intrinsic subtypes based on such integrative analyses. In this study, we employed machine learning algorithms to predict intrinsic subtypes through the integrative analysis of ATAC-seq and RNA-seq data. We identified 10 signature genes (, , , , , , , , , and ) using recursive feature elimination with cross-validation (RFECV) and a support vector machine (SVM) based on SHAP (SHapley Additive exPlanations) feature importance. Furthermore, we found that these genes were primarily associated with immune responses, hormone signaling, cancer progression, and cellular proliferation.
乳腺癌是一种异质性疾病,由各种生物学上不同的亚型组成,每个亚型都具有独特的分子特征。其形成和进展涉及一个复杂的多步骤过程,包括大量遗传和表观遗传改变的积累。尽管将RNA测序转录组数据与ATAC测序表观遗传信息相结合可以更全面地了解基因调控及其在不同条件下的影响,但尚未基于这种综合分析开发出用于乳腺癌内在亚型的分类模型。在本研究中,我们采用机器学习算法通过对ATAC测序和RNA测序数据的综合分析来预测内在亚型。我们使用基于SHAP(Shapley值加法解释)特征重要性的递归特征消除与交叉验证(RFECV)和支持向量机(SVM),鉴定出了10个特征基因(、、、、、、、、和)。此外,我们发现这些基因主要与免疫反应、激素信号传导、癌症进展和细胞增殖相关。