Kvikstad Erika M, Tyekucheva Svitlana, Chiaromonte Francesca, Makova Kateryna D
Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, Pennsylvania, USA.
PLoS Comput Biol. 2007 Sep;3(9):1772-82. doi: 10.1371/journal.pcbi.0030176. Epub 2007 Jul 27.
Insertions and deletions (indels) cause numerous genetic diseases and lead to pronounced evolutionary differences among genomes. The macaque sequences provide an opportunity to gain insights into the mechanisms generating these mutations on a genome-wide scale by establishing the polarity of indels occurring in the human lineage since its divergence from the chimpanzee. Here we apply novel regression techniques and multiscale analyses to demonstrate an extensive regional indel rate variation stemming from local fluctuations in divergence, GC content, male and female recombination rates, proximity to telomeres, and other genomic factors. We find that both replication and, surprisingly, recombination are significantly associated with the occurrence of small indels. Intriguingly, the relative inputs of replication versus recombination differ between insertions and deletions, thus the two types of mutations are likely guided in part by distinct mechanisms. Namely, insertions are more strongly associated with factors linked to recombination, while deletions are mostly associated with replication-related features. Indel as a term misleadingly groups the two types of mutations together by their effect on a sequence alignment. However, here we establish that the correct identification of a small gap as an insertion or a deletion (by use of an outgroup) is crucial to determining its mechanism of origin. In addition to providing novel insights into insertion and deletion mutagenesis, these results will assist in gap penalty modeling and eventually lead to more reliable genomic alignments.
插入和缺失(indels)会引发众多遗传疾病,并导致基因组之间显著的进化差异。猕猴序列提供了一个契机,通过确定自人类谱系与黑猩猩分化以来发生的插入缺失的极性,从而在全基因组范围内深入了解产生这些突变的机制。在这里,我们应用新颖的回归技术和多尺度分析,以证明广泛的区域插入缺失率变异源于分歧、GC含量、雄性和雌性重组率、与端粒的接近程度以及其他基因组因素的局部波动。我们发现复制以及令人惊讶的重组都与小插入缺失的发生显著相关。有趣的是,复制与重组的相对贡献在插入和缺失之间有所不同,因此这两种类型的突变可能部分由不同的机制所引导。具体而言,插入与重组相关因素的关联更为强烈,而缺失大多与复制相关特征有关。“插入缺失”这个术语通过其对序列比对的影响,将这两种类型的突变误导性地归为一类。然而,在这里我们确定,通过使用外类群正确识别一个小缺口是插入还是缺失,对于确定其起源机制至关重要。除了为插入和缺失诱变提供新的见解外,这些结果将有助于间隙罚分建模,并最终导致更可靠的基因组比对。