Nguyen Thi Mai, Kim Nackhyoung, Kim Da Hae, Le Hoang Long, Piran Md Jalil, Um Soo-Jong, Kim Jin Hee
Department of Integrative Bioscience & Biotechnology, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea.
Department of Computer Science & Engineering, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea.
Biomedicines. 2021 Nov 20;9(11):1733. doi: 10.3390/biomedicines9111733.
Deep learning (DL) is a distinct class of machine learning that has achieved first-class performance in many fields of study. For epigenomics, the application of DL to assist physicians and scientists in human disease-relevant prediction tasks has been relatively unexplored until very recently. In this article, we critically review published studies that employed DL models to predict disease detection, subtype classification, and treatment responses, using epigenomic data. A comprehensive search on PubMed, Scopus, Web of Science, Google Scholar, and arXiv.org was performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Among 1140 initially identified publications, we included 22 articles in our review. DNA methylation and RNA-sequencing data are most frequently used to train the predictive models. The reviewed models achieved a high accuracy ranged from 88.3% to 100.0% for disease detection tasks, from 69.5% to 97.8% for subtype classification tasks, and from 80.0% to 93.0% for treatment response prediction tasks. We generated a workflow to develop a predictive model that encompasses all steps from first defining human disease-related tasks to finally evaluating model performance. DL holds promise for transforming epigenomic big data into valuable knowledge that will enhance the development of translational epigenomics.
深度学习(DL)是机器学习中的一个独特类别,在许多研究领域都取得了一流的表现。对于表观基因组学而言,直到最近,深度学习在协助医生和科学家进行人类疾病相关预测任务方面的应用仍相对未被探索。在本文中,我们批判性地回顾了已发表的研究,这些研究使用表观基因组数据,采用深度学习模型来预测疾病检测、亚型分类和治疗反应。按照系统评价和荟萃分析的首选报告项目指南,我们在PubMed、Scopus、科学网、谷歌学术和arXiv.org上进行了全面检索。在最初识别出的1140篇出版物中,我们纳入了22篇文章进行综述。DNA甲基化和RNA测序数据最常用于训练预测模型。综述中的模型在疾病检测任务中的准确率高达88.3%至100.0%,在亚型分类任务中的准确率为69.5%至97.8%,在治疗反应预测任务中的准确率为80.0%至93.0%。我们生成了一个开发预测模型的工作流程,涵盖从最初定义人类疾病相关任务到最终评估模型性能的所有步骤。深度学习有望将表观基因组大数据转化为有价值的知识,这将促进转化表观基因组学的发展。