Department of Computer Science, Brown University, Providence, RI 02912, USA.
Bioinformatics. 2010 Sep 15;26(18):i446-52. doi: 10.1093/bioinformatics/btq368.
Segmental duplications > 1 kb in length with >or= 90% sequence identity between copies comprise nearly 5% of the human genome. They are frequently found in large, contiguous regions known as duplication blocks that can contain mosaic patterns of thousands of segmental duplications. Reconstructing the evolutionary history of these complex genomic regions is a non-trivial, but important task.
We introduce parsimony and likelihood techniques to analyze the evolutionary relationships between duplication blocks. Both techniques rely on a generic model of duplication in which long, contiguous substrings are copied and reinserted over large physical distances, allowing for a duplication block to be constructed by aggregating substrings of other blocks. For the likelihood method, we give an efficient dynamic programming algorithm to compute the weighted ensemble of all duplication scenarios that account for the construction of a duplication block. Using this ensemble, we derive the probabilities of various duplication scenarios. We formalize the task of reconstructing the evolutionary history of segmental duplications as an optimization problem on the space of directed acyclic graphs. We use a simulated annealing heuristic to solve the problem for a set of segmental duplications in the human genome in both parsimony and likelihood settings.
Supplementary information is available at http://www.cs.brown.edu/people/braphael/supplements/.
长度大于 1kb 且具有 >or= 90%序列同一性的片段重复构成了人类基因组的近 5%。它们经常出现在被称为重复块的大的、连续的区域中,这些区域可以包含数千个片段重复的镶嵌模式。重建这些复杂基因组区域的进化历史是一项非平凡但重要的任务。
我们引入简约和似然技术来分析重复块之间的进化关系。这两种技术都依赖于一种通用的复制模型,其中长的、连续的子字符串被复制并在大的物理距离上重新插入,允许通过聚合其他块的子字符串来构建重复块。对于似然方法,我们给出了一种有效的动态规划算法,用于计算所有复制场景的加权集合,这些场景解释了重复块的构建。利用这个集合,我们得出了各种复制场景的概率。我们将重建片段重复的进化历史的任务形式化为有向无环图空间上的优化问题。我们使用模拟退火启发式算法来解决在简约和似然设置下人类基因组中一组片段重复的问题。
补充信息可在 http://www.cs.brown.edu/people/braphael/supplements/ 获得。