Department of Genetics, University of Georgia, Athens, GA 30602.
Proc Natl Acad Sci U S A. 2014 May 6;111(18):6684-9. doi: 10.1073/pnas.1321854111. Epub 2014 Apr 23.
The insertion of DNA into a genome can result in the duplication and dispersal of functional sequences through the genome. In addition, a deeper understanding of insertion mechanisms will inform methods of genetic engineering and plant transformation. Exploiting structural variations in numerous rice accessions, we have inferred and analyzed intermediate length (10-1,000 bp) insertions in plants. Insertions in this size class were found to be approximately equal in frequency to deletions, and compound insertion-deletions comprised only 0.1% of all events. Our findings indicate that, as observed in humans, tandem or partially tandem duplications are the dominant form of insertion (48%), although short duplications from ectopic donors account for a sizable fraction of insertions in rice (38%). Many nontandem duplications contain insertions from nearby DNA (within 200 bp) and can contain multiple donor sources--some distant--in single events. Although replication slippage is a plausible explanation for tandem duplications, the end homology required in such a model is most often absent and rarely is >5 bp. However, end homology is commonly longer than expected by chance. Such findings lead us to favor a model of patch-mediated double-strand-break creation followed by nonhomologous end-joining. Additionally, a striking bias toward 31-bp partially tandem duplications suggests that errors in nucleotide excision repair may be resolved via a similar, but distinct, pathway. In summary, the analysis of recent insertions in rice suggests multiple underappreciated causes of structural variation in eukaryotes.
DNA 插入基因组会导致功能序列在基因组中重复和分散。此外,深入了解插入机制将为遗传工程和植物转化提供方法。利用大量水稻品系中的结构变异,我们推断并分析了植物中中等长度(10-1000bp)的插入。在这个大小类别的插入与缺失的频率大致相等,而复合插入缺失仅占所有事件的 0.1%。我们的研究结果表明,与在人类中观察到的情况一样,串联或部分串联重复是插入的主要形式(48%),尽管来自异位供体的短重复在水稻中的插入占相当大的比例(38%)。许多非串联重复包含来自附近 DNA(200bp 内)的插入,并且可以在单个事件中包含多个供体来源——有些很远。虽然复制滑动是串联重复的一个合理解释,但该模型所需的末端同源性通常不存在,很少超过 5bp。然而,末端同源性通常比随机预期的要长。这些发现使我们倾向于采用一种补丁介导的双链断裂创建,然后是非同源末端连接的模型。此外,31bp 部分串联重复的惊人偏好表明,核苷酸切除修复中的错误可能通过类似但不同的途径得到解决。总之,对水稻中最近插入的分析表明,真核生物中存在多种结构变异的未被充分认识的原因。