Benson G, Dong L
Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, NY 10029-6574, USA.
Proc Int Conf Intell Syst Mol Biol. 1999:44-53.
One of the less well understood mutational transformations that act upon DNA is tandem duplication. In this process, a stretch of DNA is duplicated to produce two or more adjacent copies, resulting in a tandem repeat. Over time, the copies undergo additional mutations so that typically, multiple approximate tandem copies are present. An interesting feature of tandem repeats is that the duplicated copies are preserved together, making it possible to do "phylogenetic analysis" on a single sequence. This involves using the pattern of mutations among the copies to determine a minimal or a most likely history for the repeat. A history tries to describe the interwoven pattern of substitutions, indels, and duplication events in such a way as to minimize the number of identical mutations that arise independently. Because the copies are adjacent and ordered, the history problem can not be solved by standard phylogeny algorithms. In this paper, we introduce several versions of the tandem repeat history problem, develop algorithmic solutions and evaluate their performance. We also develop ways to visualize important features of a history with the goal of discovering properties of the duplication mechanism.
作用于DNA的一种较难理解的突变转变是串联重复。在这个过程中,一段DNA被复制以产生两个或更多相邻的拷贝,从而形成串联重复序列。随着时间的推移,这些拷贝会经历额外的突变,所以通常会存在多个近似的串联拷贝。串联重复序列的一个有趣特征是,复制后的拷贝会一起保留下来,这使得对单个序列进行“系统发育分析”成为可能。这涉及利用拷贝之间的突变模式来确定重复序列的最小或最可能的历史。一段历史试图以一种尽量减少独立出现的相同突变数量的方式,来描述替换、插入缺失和重复事件的交织模式。由于这些拷贝是相邻且有序的,标准的系统发育算法无法解决历史问题。在本文中,我们介绍了串联重复历史问题的几个版本,开发了算法解决方案并评估了它们的性能。我们还开发了可视化历史重要特征的方法,目的是发现重复机制的特性。