Suppr超能文献

顺式调控元件的识别:基于机器学习视角的综述

The identification of cis-regulatory elements: A review from a machine learning perspective.

作者信息

Li Yifeng, Chen Chih-Yu, Kaye Alice M, Wasserman Wyeth W

机构信息

Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia Vancouver, British Columbia V5Z 4H4, Canada; Information and Communications Technologies, National Research Council of Canada, Ottawa, Ontario K1A 0R6, Canada.

Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia Vancouver, British Columbia V5Z 4H4, Canada.

出版信息

Biosystems. 2015 Dec;138:6-17. doi: 10.1016/j.biosystems.2015.10.002. Epub 2015 Oct 21.

Abstract

The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field.

摘要

人类基因组的大部分由被称为垃圾DNA的非编码区域组成。然而,最近的研究表明,这些区域包含顺式调控元件,如启动子、增强子、沉默子、绝缘子等。这些调控元件在特定细胞类型、条件和发育阶段控制基因表达中发挥着关键作用。这些区域的破坏可能导致表型变化。精确识别调控元件是破译转录调控机制的关键。顺式调控事件是复杂的过程,涉及染色质可及性、转录因子结合、DNA甲基化、组蛋白修饰以及它们之间的相互作用。下一代测序技术的发展使我们能够深入捕捉这些基因组特征。临床遗传学中基因组序列的应用分析增加了检测这些区域的紧迫性。然而,顺式调控事件的复杂性和测序数据的海量需要准确而高效的计算方法,特别是机器学习技术。在这篇综述中,我们描述了主要由下一代测序数据驱动的用于预测转录因子结合位点、增强子和启动子的机器学习方法。提供了数据来源以方便新方法的测试。这篇综述的目的是吸引计算专家和数据科学家推动这一领域的发展。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验