Prabhakar Shyam, Poulin Francis, Shoukry Malak, Afzal Veena, Rubin Edward M, Couronne Olivier, Pennacchio Len A
Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA.
Genome Res. 2006 Jul;16(7):855-63. doi: 10.1101/gr.4717506. Epub 2006 Jun 12.
Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons. To address this problem, we identified evolutionarily conserved noncoding regions in primate, mammalian, and more distant comparisons using a uniform approach (Gumby) that facilitates unbiased assessment of the impact of evolutionary distance on predictive power. We benchmarked computational predictions against previously identified cis-regulatory elements at diverse genomic loci and also tested numerous extremely conserved human-rodent sequences for transcriptional enhancer activity using an in vivo enhancer assay in transgenic mice. Human regulatory elements were identified with acceptable sensitivity (53%-80%) and true-positive rate (27%-67%) by comparison with one to five other eutherian mammals or six other simian primates. More distant comparisons (marsupial, avian, amphibian, and fish) failed to identify many of the empirically defined functional noncoding elements. Our results highlight the practical utility of close sequence comparisons, and the loss of sensitivity entailed by more distant comparisons. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole-genome comparative analysis that explains most of the observations from empirical benchmarking. Lastly, we determined that, in addition to strength of conservation, genomic location and/or density of surrounding conserved elements must also be considered in selecting candidate enhancers for in vivo testing at embryonic time points.
跨物种DNA序列比较是用于识别人类和其他大型基因组中功能性非编码元件的主要方法。然而,对于进化上相近和较远的序列比较的相对优点,我们知之甚少。为了解决这个问题,我们使用一种统一的方法(Gumby)在灵长类、哺乳类以及更远的比较中识别进化上保守的非编码区域,这种方法有助于公正地评估进化距离对预测能力的影响。我们将计算预测结果与之前在不同基因组位点鉴定出的顺式调控元件进行了基准测试,还使用转基因小鼠体内增强子检测法测试了众多高度保守的人类-啮齿动物序列的转录增强子活性。通过与一至五种其他真兽类哺乳动物或六种其他猿猴灵长类动物进行比较,人类调控元件得以以可接受的灵敏度(53%-80%)和真阳性率(27%-67%)被识别出来。更远的比较(有袋类、鸟类、两栖类和鱼类)未能识别出许多经实验确定的功能性非编码元件。我们的结果突出了相近序列比较的实际效用,以及更远比较所带来的灵敏度损失。我们从全基因组比较分析中得出了古老和近期非编码序列保守性之间的直观关系,这解释了实证基准测试中的大部分观察结果。最后,我们确定,除了保守性强度外,在选择用于胚胎时间点体内测试的候选增强子时,还必须考虑基因组位置和/或周围保守元件的密度。