Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.
Cornell Tech, Cornell University, New York, NY 14853, USA.
Cell Rep Methods. 2024 Jun 17;4(6):100781. doi: 10.1016/j.crmeth.2024.100781. Epub 2024 May 17.
We present an innovative strategy for integrating whole-genome-wide multi-omics data, which facilitates adaptive amalgamation by leveraging hidden layer features derived from high-dimensional omics data through a multi-task encoder. Empirical evaluations on eight benchmark cancer datasets substantiated that our proposed framework outstripped the comparative algorithms in cancer subtyping, delivering superior subtyping outcomes. Building upon these subtyping results, we establish a robust pipeline for identifying whole-genome-wide biomarkers, unearthing 195 significant biomarkers. Furthermore, we conduct an exhaustive analysis to assess the importance of each omic and non-coding region features at the whole-genome-wide level during cancer subtyping. Our investigation shows that both omics and non-coding region features substantially impact cancer development and survival prognosis. This study emphasizes the potential and practical implications of integrating genome-wide data in cancer research, demonstrating the potency of comprehensive genomic characterization. Additionally, our findings offer insightful perspectives for multi-omics analysis employing deep learning methodologies.
我们提出了一种整合全基因组多组学数据的创新策略,该策略通过利用高维组学数据通过多任务编码器得出的隐藏层特征,促进自适应合并。在八个基准癌症数据集上的实证评估证实,我们提出的框架在癌症亚型方面优于比较算法,提供了更优的亚型结果。基于这些亚型结果,我们建立了一个强大的管道,用于识别全基因组范围内的生物标志物,挖掘出 195 个显著的生物标志物。此外,我们进行了详尽的分析,以评估在癌症亚型中,整个基因组水平上每个组学和非编码区域特征的重要性。我们的研究表明,组学和非编码区域特征都对癌症的发展和生存预后有重大影响。这项研究强调了在癌症研究中整合全基因组数据的潜力和实际意义,展示了全面基因组特征的强大功能。此外,我们的发现为使用深度学习方法进行多组学分析提供了有见地的观点。