Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Canada; Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada.
Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.
J Biomed Inform. 2022 Jan;125:103958. doi: 10.1016/j.jbi.2021.103958. Epub 2021 Nov 25.
Breast cancer is a highly heterogeneous disease. Subtyping the disease and identifying the genomic features driving these subtypes are critical for precision oncology for breast cancer. This study focuses on developing a new computational approach for breast cancer subtyping. We proposed to use Bayesian tensor factorization (BTF) to integrate multi-omics data of breast cancer, which include expression profiles of RNA-sequencing, copy number variation, and DNA methylation measured on 762 breast cancer patients from The Cancer Genome Atlas. We applied a consensus clustering approach to identify breast cancer subtypes using the factorized latent features by BTF. Subtype-specific survival patterns of the breast cancer patients were evaluated using Kaplan-Meier (KM) estimators. The proposed approach was compared with other state-of-the-art approaches for cancer subtyping. The BTF-subtyping analysis identified 17 optimized latent components, which were used to reveal six major breast cancer subtypes. Out of all different approaches, only the proposed approach showed distinct survival patterns (p < 0.05). Statistical tests also showed that the identified clusters have statistically significant distributions. Our results showed that the proposed approach is a promising strategy to efficiently use publicly available multi-omics data to identify breast cancer subtypes.
乳腺癌是一种高度异质性的疾病。对疾病进行亚型分类,并确定驱动这些亚型的基因组特征,对于乳腺癌的精准肿瘤学至关重要。本研究专注于开发一种新的乳腺癌亚型分类计算方法。我们提出使用贝叶斯张量分解(BTF)来整合乳腺癌的多组学数据,这些数据包括来自癌症基因组图谱的 762 名乳腺癌患者的 RNA-seq 表达谱、拷贝数变异和 DNA 甲基化。我们应用共识聚类方法,使用 BTF 的因子化潜在特征来识别乳腺癌亚型。使用 Kaplan-Meier(KM)估计器评估乳腺癌患者的亚型特异性生存模式。该方法与其他癌症亚型分类的最先进方法进行了比较。BTF 亚型分析确定了 17 个优化的潜在成分,这些成分用于揭示 6 种主要的乳腺癌亚型。在所有不同的方法中,只有所提出的方法显示出明显的生存模式(p < 0.05)。统计检验还表明,所识别的聚类具有统计学上显著的分布。我们的结果表明,该方法是一种很有前途的策略,可以有效地利用公开的多组学数据来识别乳腺癌亚型。