Khost Daniel E, Eickbush Danna G, Larracuente Amanda M
Department of Biology, University of Rochester, Rochester, New York 14627, USA.
Genome Res. 2017 May;27(5):709-721. doi: 10.1101/gr.213512.116. Epub 2017 Apr 3.
Highly repetitive satellite DNA (satDNA) repeats are found in most eukaryotic genomes. SatDNAs are rapidly evolving and have roles in genome stability and chromosome segregation. Their repetitive nature poses a challenge for genome assembly and makes progress on the detailed study of satDNA structure difficult. Here, we use single-molecule sequencing long reads from Pacific Biosciences (PacBio) to determine the detailed structure of all major autosomal complex satDNA loci in , with a particular focus on the and satellites. We determine the optimal de novo assembly methods and parameter combinations required to produce a high-quality assembly of these previously unassembled satDNA loci and validate this assembly using molecular and computational approaches. We determined that the computationally intensive PBcR-BLASR assembly pipeline yielded better assemblies than the faster and more efficient pipelines based on the MHAP hashing algorithm, and it is essential to validate assemblies of repetitive loci. The assemblies reveal that satDNA repeats are organized into large arrays interrupted by transposable elements. The repeats in the center of the array tend to be homogenized in sequence, suggesting that gene conversion and unequal crossovers lead to repeat homogenization through concerted evolution, although the degree of unequal crossing over may differ among complex satellite loci. We find evidence for higher-order structure within satDNA arrays that suggest recent structural rearrangements. These assemblies provide a platform for the evolutionary and functional genomics of satDNAs in pericentric heterochromatin.
高度重复的卫星DNA(satDNA)重复序列存在于大多数真核生物基因组中。卫星DNA正在迅速进化,并在基因组稳定性和染色体分离中发挥作用。它们的重复性质给基因组组装带来了挑战,也使得对卫星DNA结构的详细研究难以取得进展。在这里,我们使用来自太平洋生物科学公司(PacBio)的单分子测序长读长来确定[具体物种]所有主要常染色体复杂卫星DNA位点的详细结构,特别关注[具体卫星名称1]和[具体卫星名称2]卫星。我们确定了产生这些先前未组装的卫星DNA位点高质量组装所需的最佳从头组装方法和参数组合,并使用分子和计算方法验证了这种组装。我们确定,计算量较大的PBcR - BLASR组装流程比基于MHAP哈希算法的更快、更高效的流程产生了更好的组装结果,并且验证重复位点的组装至关重要。这些组装结果表明,卫星DNA重复序列被组织成由转座元件中断的大阵列。阵列中心的重复序列在序列上趋于同质化,这表明基因转换和不等交换通过协同进化导致重复序列同质化,尽管不等交换的程度在复杂卫星位点之间可能有所不同。我们发现了卫星DNA阵列内高阶结构的证据,这表明最近发生了结构重排。这些组装为着丝粒周围异染色质中卫星DNA的进化和功能基因组学提供了一个平台。