Subirana Juan A, Messeguer Xavier
Department of Computer Science, Universitat Politècnica de Catalunya, Jordi Girona 1-3, 08034 Barcelona, Spain.
Evolutionary Genomics Group, Research Program on Biomedical Informatics (GRIB)⁻Hospital del Mar Research Institute (IMIM), Universitat Pompeu Fabra (UPF), Dr. Aiguader 86, 08003 Barcelona, Spain.
Genes (Basel). 2018 Oct 16;9(10):500. doi: 10.3390/genes9100500.
Repetitive genome regions have been difficult to sequence, mainly because of the comparatively small size of the fragments used in assembly. Satellites or tandem repeats are very abundant in nematodes and offer an excellent playground to evaluate different assembly methods. Here, we compare the structure of satellites found in three different assemblies of the genome: the original sequence obtained by Sanger sequencing, an assembly based on PacBio technology, and an assembly using Nanopore sequencing reads. In general, satellites were found in equivalent genomic regions, but the new long-read methods (PacBio and Nanopore) tended to result in longer assembled satellites. Important differences exist between the assemblies resulting from the two long-read technologies, such as the sizes of long satellites. Our results also suggest that the lengths of some annotated genes with internal repeats which were assembled using Sanger sequencing are likely to be incorrect.
重复的基因组区域一直难以测序,主要是因为用于组装的片段相对较小。卫星序列或串联重复序列在线虫中非常丰富,为评估不同的组装方法提供了一个绝佳的“试验场”。在这里,我们比较了在该基因组的三种不同组装中发现的卫星序列结构:通过桑格测序获得的原始序列、基于PacBio技术的组装以及使用纳米孔测序读数的组装。总体而言,在等效的基因组区域中发现了卫星序列,但新的长读长方法(PacBio和纳米孔)往往会产生更长的组装卫星序列。两种长读长技术产生的组装结果之间存在重要差异,例如长卫星序列的大小。我们的结果还表明,一些使用桑格测序组装的带有内部重复序列的注释基因的长度可能是不正确的。