Navajas-Pérez Rafael, Paterson Andrew H
Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA.
Mol Genet Genomics. 2009 Jun;281(6):579-90. doi: 10.1007/s00438-009-0433-y. Epub 2009 Feb 26.
Tandem repeats often confound large genome assemblies. A survey of tandemly arrayed repetitive sequences was carried out in whole genome sequences of the green alga Chlamydomonas reinhardtii, the moss Physcomitrella patens, the monocots rice and sorghum, and the dicots Arabidopsis thaliana, poplar, grapevine, and papaya, in order to test how these assemblies deal with this fraction of DNA. Our results suggest that plant genome assemblies preferentially include tandem repeats composed of shorter monomeric units (especially dinucleotide and 9-30-bp repeats), while higher repetitive units pose more difficulties to assemble. Nevertheless, notwithstanding that currently available sequencing technologies struggle with higher arrays of repeated DNA, major well-known repetitive elements including centromeric and telomeric repeats as well as high copy-number genes, were found to be reasonably well represented. A database including all tandem repeat sequences characterized here was created to benefit future comparative genomic analyses.
串联重复序列常常使大型基因组组装工作变得复杂。为了测试这些基因组组装如何处理这部分DNA,我们对莱茵衣藻、小立碗藓、单子叶植物水稻和高粱以及双子叶植物拟南芥、杨树、葡萄和木瓜的全基因组序列中的串联排列重复序列进行了调查。我们的结果表明,植物基因组组装优先纳入由较短单体单元组成的串联重复序列(尤其是二核苷酸和9 - 30碱基对的重复序列),而较高的重复单元在组装时会带来更多困难。尽管目前可用的测序技术在处理更高阵列的重复DNA时存在困难,但主要的知名重复元件,包括着丝粒和端粒重复序列以及高拷贝数基因,被发现得到了合理的良好呈现。我们创建了一个包含这里所鉴定的所有串联重复序列的数据库,以利于未来的比较基因组分析。