Rostami Zahra, Mukund Kavitha, Masnadi-Shirazi Maryam, Subramaniam Shankar
Department of Computer Science and Engineering, University of California San Diego, San Diego, California, United States of America.
Department of Bioengineering, University of California San Diego, San Diego, California, United States of America.
PLoS One. 2025 Jul 23;20(7):e0327773. doi: 10.1371/journal.pone.0327773. eCollection 2025.
Heterogeneity of breast cancer poses several challenges for detection and treatment. With next-generation sequencing, we can now map the transcriptional profile of each patient's breast tissue, which has the potential for identifying and characterizing cancer subtypes. However, the large dimensionality of this transcriptomic data and the heterogeneity between the molecular profiles of breast cancers poses a barrier to identifying minimal markers and mechanistic consequences. In this study, we develop an autoencoder to identify a reduced set of gene markers that characterize the four major breast cancer subtypes with the accuracy of 82.38%. The reduced feature space created by our model captures the functional characteristics of each breast cancer subtype highlighting mechanisms that are unique to each subtype as well as those that are shared. Our high prediction accuracy shows that our markers can be valuable for breast cancer subtype detection and have the potential to provide insights into mechanisms associated with each subtype.
乳腺癌的异质性给检测和治疗带来了诸多挑战。借助下一代测序技术,我们现在能够绘制每位患者乳腺组织的转录图谱,这为识别和表征癌症亚型提供了可能。然而,这种转录组数据的高维度以及乳腺癌分子图谱之间的异质性,对识别最小标记物和机制后果构成了障碍。在本研究中,我们开发了一种自动编码器,以识别一组经过简化的基因标记物,这些标记物能够以82.38%的准确率表征四种主要的乳腺癌亚型。我们的模型所创建的简化特征空间捕捉到了每种乳腺癌亚型的功能特征,突出了各亚型独特的机制以及共享的机制。我们较高的预测准确率表明,我们的标记物对于乳腺癌亚型检测具有重要价值,并且有可能为与每种亚型相关的机制提供深入见解。