Zhang Xiaoyu, Xing Yuting, Sun Kai, Guo Yike
Data Science Institute, Imperial College London, London SW7 2AZ, UK.
Department of Computer Science, Hong Kong Baptist University, Hong Kong 999077, China.
Cancers (Basel). 2021 Jun 18;13(12):3047. doi: 10.3390/cancers13123047.
High-dimensional omics data contain intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data, due to the large number of molecular features and small number of available samples, which is also called "the curse of dimensionality" in machine learning. To tackle this problem and pave the way for machine learning-aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed supports multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy compared to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various applications of high-dimensional omics data and has great potential to facilitate more accurate and personalised clinical decision making.
高维组学数据包含对个性化医疗至关重要的内在生物医学信息。然而,从全基因组数据中获取这些信息具有挑战性,这是由于分子特征数量众多而可用样本数量较少,这在机器学习中也被称为“维数灾难”。为了解决这个问题并为机器学习辅助的精准医疗铺平道路,我们提出了一个名为OmiEmbed的统一多任务深度学习框架,通过深度嵌入和下游任务模块从高维组学数据中捕获生物医学信息。深度嵌入模块学习了一种组学嵌入,将多种组学数据类型映射到一个低维的潜在空间。基于多组学数据的新表示,不同的下游任务模块通过多任务策略同时进行高效训练,以预测每个样本的综合表型特征。OmiEmbed支持多种组学数据任务,包括降维、肿瘤类型分类、多组学整合、人口统计学和临床特征重建以及生存预测。该框架在所有三种类型的下游任务上均优于其他方法,并且与单独训练相比,通过多任务策略取得了更好的性能。OmiEmbed是一个强大且统一的框架,可广泛应用于高维组学数据的各种应用,并且在促进更准确和个性化的临床决策方面具有巨大潜力。