GuhaThakurta Debraj
Research Genetics Division, Rosetta Inpharmatics LLC, Merck & Co., Inc, 401 Terry Avenue North, Seattle, WA 98109, USA.
Nucleic Acids Res. 2006 Jul 19;34(12):3585-98. doi: 10.1093/nar/gkl372. Print 2006.
Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computational algorithms is difficult. However, significant advances have been made in the computational methods for modeling and detection of DNA regulatory elements. The availability of complete genome sequence from multiple organisms, as well as mRNA profiling and high-throughput experimental methods for mapping protein-binding sites in DNA, have contributed to the development of methods that utilize these auxiliary data to inform the detection of transcriptional regulatory elements. Progress is also being made in the identification of cis-regulatory modules and higher order structures of the regulatory sequences, which is essential to the understanding of transcription regulation in the metazoan genomes. This article reviews the computational approaches for modeling and identification of genomic regulatory elements, with an emphasis on the recent developments, and current challenges.
识别和注释基因组中的所有功能元件,包括基因和调控序列,是基因组学和计算生物学中的一项基本挑战。由于调控元件通常较短且具有变异性,使用计算算法对其进行识别和发现具有一定难度。然而,在DNA调控元件建模和检测的计算方法方面已经取得了显著进展。多种生物完整基因组序列的可得性,以及用于绘制DNA中蛋白质结合位点的mRNA分析和高通量实验方法,推动了利用这些辅助数据来指导转录调控元件检测的方法的发展。在顺式调控模块和调控序列的高阶结构的识别方面也取得了进展,这对于理解后生动物基因组中的转录调控至关重要。本文综述了用于基因组调控元件建模和识别的计算方法,重点介绍了近期的进展和当前面临的挑战。