Ko Young-Joon, Kim Jung Sun, Kim Sangsoo
Department of Bioinformatics and Life Science, Soongsil University, Seoul 06978, Korea.
Genomics Division, Department of Agricultural Biotechnology, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju 54874, Korea.
Genomics Inform. 2017 Dec;15(4):128-135. doi: 10.5808/GI.2017.15.4.128. Epub 2017 Dec 29.
As next-generation sequencing technologies have advanced, enormous amounts of whole-genome sequence information in various species have been released. However, it is still difficult to assemble the whole genome precisely, due to inherent limitations of short-read sequencing technologies. In particular, the complexities of plants are incomparable to those of microorganisms or animals because of whole-genome duplications, repeat insertions, and Numt insertions, etc. In this study, we describe a new method for detecting misassembly sequence regions of with genotyping-by-sequencing, followed by MadMapper clustering. The misassembly candidate regions were cross-checked with BAC clone paired-ends library sequences that have been mapped to the reference genome. The results were further verified with gene synteny relations between and . We conclude that this method will help detect misassembly regions and be applicable to incompletely assembled reference genomes from a variety of species.
随着下一代测序技术的发展,各种物种的大量全基因组序列信息已被公布。然而,由于短读长测序技术的固有局限性,精确组装整个基因组仍然很困难。特别是,由于全基因组复制、重复插入和核线粒体DNA插入等原因,植物的复杂性与微生物或动物的复杂性无法相比。在本研究中,我们描述了一种新方法,通过测序分型检测组装错误的序列区域,随后进行MadMapper聚类。将组装错误候选区域与已映射到参考基因组的BAC克隆双端文库序列进行交叉核对。结果通过[物种1]和[物种2]之间的基因共线性关系进一步验证。我们得出结论,该方法将有助于检测组装错误区域,并适用于来自各种物种的未完全组装的参考基因组。