Zhao Yize, Chang Changgee, Long Qi
Weill Cornell Medicine, New York, NY.
University of Pennsylvania Perelman School of Medicine, Philadelphia, PA.
JCO Precis Oncol. 2019 Oct 24;3. doi: 10.1200/PO.19.00018. eCollection 2019 Oct.
High-dimensional -omics data such as genomic, transcriptomic, and metabolomic data offer great promise in advancing precision medicine. In particular, such data have enabled the investigation of complex diseases such as cancer at an unprecedented scale and in multiple dimensions. However, a number of analytical challenges complicate analysis of high-dimensional -omics data. One is the growing recognition that complex diseases such as cancer are multifactorial and may be attributed to harmful changes on multiple -omics levels and on the pathway level. When individual genes in an important pathway have relatively weak signals, it can be challenging to detect them on their own, but the aggregated signal in the pathway can be considerably stronger and hence easier to detect with the same sample size. To address these challenges, there is a growing body of literature on knowledge-guided statistical learning methods for analysis of high-dimensional -omics data that can incorporate biological knowledge such as functional genomics and functional proteomics. These methods have been shown to improve predication and classification accuracy and yield biologically more interpretable results compared with statistical learning methods that do not use biological knowledge. In this review, we survey current knowledge-guided statistical learning methods, including both supervised learning and unsupervised learning, and their applications to precision oncology, and we discuss future research directions.
高维组学数据,如基因组学、转录组学和代谢组学数据,在推进精准医学方面具有巨大潜力。特别是,此类数据使得对癌症等复杂疾病的研究能够以前所未有的规模和多维度进行。然而,一些分析挑战使高维组学数据的分析变得复杂。其中之一是人们越来越认识到,癌症等复杂疾病是多因素的,可能归因于多个组学层面和通路层面的有害变化。当重要通路中的单个基因信号相对较弱时,单独检测它们可能具有挑战性,但通路中的聚合信号可能会强得多,因此在相同样本量下更容易检测到。为应对这些挑战,关于用于分析高维组学数据的知识引导统计学习方法的文献越来越多,这些方法可以纳入功能基因组学和功能蛋白质组学等生物学知识。与不使用生物学知识的统计学习方法相比,这些方法已被证明可以提高预测和分类准确性,并产生生物学上更具可解释性的结果。在本综述中,我们调查了当前的知识引导统计学习方法,包括监督学习和无监督学习,以及它们在精准肿瘤学中的应用,并讨论了未来的研究方向。