A. K. Choudhury School of Information Technology, University of Calcutta, JD-2, Sector III, Salt Lake City, Kolkata 700106, India.
Novo Nordisk Research Center Seattle, Inc., 530 Fairview Ave N # 5000, Seattle, WA 98109, United States.
Genomics. 2020 Jul;112(4):2833-2841. doi: 10.1016/j.ygeno.2020.03.021. Epub 2020 Mar 29.
Gene expression analysis plays a significant role for providing molecular insights in cancer. Various genetic and epigenetic factors (being dealt under multi-omics) affect gene expression giving rise to cancer phenotypes. A recent growth in understanding of multi-omics seems to provide a resource for integration in interdisciplinary biology since they altogether can draw the comprehensive picture of an organism's developmental and disease biology in cancers. Such large scale multi-omics data can be obtained from public consortium like The Cancer Genome Atlas (TCGA) and several other platforms. Integrating these multi-omics data from varied platforms is still challenging due to high noise and sensitivity of the platforms used. Currently, a robust integrative predictive model to estimate gene expression from these genetic and epigenetic data is lacking. In this study, we have developed a deep learning-based predictive model using Deep Denoising Auto-encoder (DDAE) and Multi-layer Perceptron (MLP) that can quantitatively capture how genetic and epigenetic alterations correlate with directionality of gene expression for liver hepatocellular carcinoma (LIHC). The DDAE used in the study has been trained to extract significant features from the input omics data to estimate the gene expression. These features have then been used for back-propagation learning by the multilayer perceptron for the task of regression and classification. We have benchmarked the proposed model against state-of-the-art regression models. Finally, the deep learning-based integration model has been evaluated for its disease classification capability, where an accuracy of 95.1% has been obtained.
基因表达分析在提供癌症分子见解方面起着重要作用。各种遗传和表观遗传因素(在多组学中处理)影响基因表达,导致癌症表型。对多组学的理解最近有了显著的增长,似乎为跨学科生物学的整合提供了资源,因为它们可以共同描绘癌症中生物体发育和疾病生物学的综合图景。这种大规模的多组学数据可以从公共联盟(如癌症基因组图谱(TCGA))和其他几个平台获得。由于所使用的平台具有高噪声和敏感性,因此整合来自不同平台的这些多组学数据仍然具有挑战性。目前,缺乏从这些遗传和表观遗传数据中估计基因表达的稳健综合预测模型。在这项研究中,我们使用深度去噪自动编码器(DDAE)和多层感知机(MLP)开发了一种基于深度学习的预测模型,该模型可以定量捕捉遗传和表观遗传改变如何与肝肝细胞癌(LIHC)的基因表达方向相关。研究中使用的 DDAE 经过训练,可以从输入的组学数据中提取重要特征,以估计基因表达。然后,这些特征通过多层感知机用于回归和分类任务的反向传播学习。我们对提出的模型进行了基准测试,以评估其回归模型的性能。最后,对基于深度学习的集成模型进行了疾病分类能力的评估,获得了 95.1%的准确率。