Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, Guangzhou, 381, China.
BMC Med Inform Decis Mak. 2020 Jul 9;20(Suppl 3):129. doi: 10.1186/s12911-020-1114-3.
With the rapid development of sequencing technologies, collecting diverse types of cancer omics data become more cost-effective. Many computational methods attempted to represent and fuse multiple omics into a comprehensive view of cancer. However, different types of omics are related and heterogeneous. Most of the existing methods do not consider the difference between omics, so the biological knowledge of individual omics may not be fully excavated. And for a given task (e.g. predicting overall survival), these methods prefer to use sample similarity or domain knowledge to learn a more reasonable representation of omics, but it's not enough.
For the purpose of learning more useful representation for individual omics and fusing them to improve the prediction ability, we proposed an autoencoder-based method named MOSAE (Multi-omics Supervised Autoencoder). In our method, a specific autoencoder were designed for each omics according to their size of dimension to generate omics-specific representations. Then, a supervised autoencoder was constructed based on specific autoencoder by using labels to enforce each specific autoencoder to learn both omics-specific and task-specific representations. Finally, representations of different omics that generate from supervised autoencoders were fused in a traditional but powerful way, and the fused representation was used for subsequent predictive tasks.
We applied our method over TCGA Pan-Cancer dataset to predict four different clinical outcome endpoints (OS, PFI, DFI, and DSS). Compared with traditional and state-of-the-art methods, MOSAE achieved better predictive performance. We also tested the effects of each improvement, which all have a positive effect on predictive performance.
Predicting clinical outcome endpoints are very important for precision medicine and personalized medicine. And multi-omics fusion is an effective way to solve this problem. MOSAE is a powerful multi-omics fusion method, which can generate both omics-specific and task-specific representation for given endpoint predictive tasks and improve the predictive performance.
随着测序技术的快速发展,收集各种类型的癌症组学数据变得更加经济实惠。许多计算方法试图将多种组学数据表示并融合为癌症的综合视图。然而,不同类型的组学数据之间存在相关性和异质性。大多数现有的方法都没有考虑组学数据之间的差异,因此个体组学的生物学知识可能没有被充分挖掘。对于给定的任务(例如预测总生存期),这些方法更倾向于使用样本相似性或领域知识来学习更合理的组学表示,但这还不够。
为了学习更有用的个体组学表示并融合它们以提高预测能力,我们提出了一种基于自动编码器的方法,称为 MOSAE(多组学监督自动编码器)。在我们的方法中,根据每个组学的维度大小为其设计特定的自动编码器,以生成组学特异性表示。然后,基于特定的自动编码器构建监督自动编码器,使用标签强制每个特定的自动编码器学习组学特异性和任务特异性表示。最后,将来自监督自动编码器的不同组学的表示以传统但强大的方式融合,并将融合的表示用于后续的预测任务。
我们在 TCGA 泛癌数据集上应用我们的方法来预测四个不同的临床结局终点(OS、PFI、DFI 和 DSS)。与传统方法和最先进的方法相比,MOSAE 实现了更好的预测性能。我们还测试了每个改进的效果,它们都对预测性能有积极的影响。
预测临床结局终点对于精准医学和个性化医学非常重要。多组学融合是解决这个问题的有效方法。MOSAE 是一种强大的多组学融合方法,它可以为给定的终点预测任务生成组学特异性和任务特异性表示,并提高预测性能。