BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Herestraat 49, 3000 Leuven, Belgium.
Medical Imaging Research Center, University Hospitals Leuven, Herestraat 49, 3000 Leuven, Belgium.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae512.
Unsupervised learning, particularly clustering, plays a pivotal role in disease subtyping and patient stratification, especially with the abundance of large-scale multi-omics data. Deep learning models, such as variational autoencoders (VAEs), can enhance clustering algorithms by leveraging inter-individual heterogeneity. However, the impact of confounders-external factors unrelated to the condition, e.g. batch effect or age-on clustering is often overlooked, introducing bias and spurious biological conclusions. In this work, we introduce four novel VAE-based deconfounding frameworks tailored for clustering multi-omics data. These frameworks effectively mitigate confounding effects while preserving genuine biological patterns. The deconfounding strategies employed include (i) removal of latent features correlated with confounders, (ii) a conditional VAE, (iii) adversarial training, and (iv) adding a regularization term to the loss function. Using real-life multi-omics data from The Cancer Genome Atlas, we simulated various confounding effects (linear, nonlinear, categorical, mixed) and assessed model performance across 50 repetitions based on reconstruction error, clustering stability, and deconfounding efficacy. Our results demonstrate that our novel models, particularly the conditional multi-omics VAE (cXVAE), successfully handle simulated confounding effects and recover biologically driven clustering structures. cXVAE accurately identifies patient labels and unveils meaningful pathological associations among cancer types, validating deconfounded representations. Furthermore, our study suggests that some of the proposed strategies, such as adversarial training, prove insufficient in confounder removal. In summary, our study contributes by proposing innovative frameworks for simultaneous multi-omics data integration, dimensionality reduction, and deconfounding in clustering. Benchmarking on open-access data offers guidance to end-users, facilitating meaningful patient stratification for optimized precision medicine.
无监督学习,尤其是聚类,在疾病亚型和患者分层方面发挥着关键作用,尤其是在大规模多组学数据丰富的情况下。深度学习模型,如变分自动编码器(VAEs),可以通过利用个体间的异质性来增强聚类算法。然而,混杂因素(与疾病无关的外部因素,例如批次效应或年龄)对聚类的影响往往被忽视,从而引入偏差和虚假的生物学结论。在这项工作中,我们引入了四个新的基于 VAE 的去混杂框架,专门用于聚类多组学数据。这些框架有效地减轻了混杂效应,同时保留了真实的生物学模式。所采用的去混杂策略包括(i)去除与混杂因素相关的潜在特征,(ii)条件 VAE,(iii)对抗训练,以及(iv)在损失函数中添加正则化项。我们使用来自癌症基因组图谱的真实多组学数据,模拟了各种混杂效应(线性、非线性、分类、混合),并基于重建误差、聚类稳定性和去混杂效果,在 50 次重复中评估了模型性能。我们的结果表明,我们的新模型,特别是条件多组学 VAE(cXVAE),成功地处理了模拟的混杂效应,并恢复了由生物学驱动的聚类结构。cXVAE 准确地识别了患者标签,并揭示了癌症类型之间有意义的病理关联,验证了去混杂表示的有效性。此外,我们的研究表明,一些提出的策略,如对抗训练,在去除混杂因素方面证明是不够的。总之,我们的研究通过提出用于聚类中同时多组学数据集成、降维和去混杂的创新框架做出了贡献。在开放访问数据上进行基准测试为最终用户提供了指导,有助于为优化精准医学进行有意义的患者分层。
Brief Bioinform. 2024-9-23
2025-1
Cochrane Database Syst Rev. 2024-10-17
Brief Bioinform. 2025-7-2
Cochrane Database Syst Rev. 2022-1-17
Arch Ital Urol Androl. 2025-6-30
Front Med (Lausanne). 2025-7-23
Int J Epidemiol. 2022-6-13
IEEE Winter Conf Appl Comput Vis. 2021-1
Bioinformatics. 2020-12-30
Bioinformatics. 2021-6-16