Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America.
Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America.
Biochim Biophys Acta Rev Cancer. 2021 Dec;1876(2):188588. doi: 10.1016/j.bbcan.2021.188588. Epub 2021 Jul 7.
The recent deluge of genome-wide technologies for the mapping of the epigenome and resulting data in cancer samples has provided the opportunity for gaining insights into and understanding the roles of epigenetic processes in cancer. However, the complexity, high-dimensionality, sparsity, and noise associated with these data pose challenges for extensive integrative analyses. Machine Learning (ML) algorithms are particularly suited for epigenomic data analyses due to their flexibility and ability to learn underlying hidden structures. We will discuss four overlapping but distinct major categories under ML: dimensionality reduction, unsupervised methods, supervised methods, and deep learning (DL). We review the preferred use cases of these algorithms in analyses of cancer epigenomics data with the hope to provide an overview of how ML approaches can be used to explore fundamental questions on the roles of epigenome in cancer biology and medicine.
近年来,用于绘制癌症样本中表观基因组图谱的全基因组技术大量涌现,这为深入了解和理解表观遗传过程在癌症中的作用提供了机会。然而,这些数据的复杂性、高维度、稀疏性和噪声给广泛的综合分析带来了挑战。机器学习 (ML) 算法由于其灵活性和学习潜在隐藏结构的能力,特别适合于表观基因组数据分析。我们将讨论 ML 下的四个重叠但不同的主要类别:降维、无监督方法、监督方法和深度学习 (DL)。我们回顾了这些算法在癌症表观基因组数据分析中的首选用例,希望能概述 ML 方法如何用于探索表观基因组在癌症生物学和医学中的作用这一基本问题。