Department of Computer Science, Stanford University, Stanford, California, United States of America.
PLoS Comput Biol. 2010 Dec 2;6(12):e1001025. doi: 10.1371/journal.pcbi.1001025.
Computational efforts to identify functional elements within genomes leverage comparative sequence information by looking for regions that exhibit evidence of selective constraint. One way of detecting constrained elements is to follow a bottom-up approach by computing constraint scores for individual positions of a multiple alignment and then defining constrained elements as segments of contiguous, highly scoring nucleotide positions. Here we present GERP++, a new tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottom-up methods, a novel dynamic programming approach to subsequently define constrained elements. GERP++ evaluates a richer set of candidate element breakpoints and ranks them based on statistical significance, eliminating the need for biased heuristic extension techniques. Using GERP++ we identify over 1.3 million constrained elements spanning over 7% of the human genome. We predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences. GERP++ is an efficient and effective tool to provide both nucleotide- and element-level constraint scores within deep multiple sequence alignments.
计算在基因组中识别功能元素的努力利用了比较序列信息,通过寻找表现出选择约束证据的区域。检测约束元素的一种方法是采用自下而上的方法,计算多序列比对中各个位置的约束得分,然后将约束元素定义为连续的、得分高的核苷酸位置的片段。在这里,我们提出了 GERP++,这是一种新的工具,它使用最大似然进化率估计对位置特异性进行评分,与以前的自下而上的方法不同,它使用了一种新的动态规划方法来随后定义约束元素。GERP++评估了更丰富的候选元素断点集,并根据统计显著性对它们进行排名,从而消除了对有偏差的启发式扩展技术的需求。使用 GERP++,我们确定了超过 130 万个约束元素,跨越了人类基因组的 7%以上。我们预测的比例高于早期的估计,主要是由于更长的约束元素的注释,这提高了预测元素与已知功能序列之间的一一对应关系。GERP++是一种高效、有效的工具,可在深度多重序列比对中提供核苷酸和元素级别的约束得分。