Department of Life Science, Chung-Ang University, Seoul, Korea.
Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, Australia
Life Sci Alliance. 2023 Feb 6;6(4). doi: 10.26508/lsa.202201744. Print 2023 Apr.
Assembling fragmented whole-genomic information from the sequencing data is an inevitable process for further genome-wide research. However, it is intricate to select the appropriate assembly pipeline for unknown species because of the species-specific genomic properties. Therefore, our study focused on relatively more static proclivities of sequencing platforms and assembly algorithms than the fickle genome sequences. A total of 212 draft and polished de novo assemblies were constructed under the different sequencing platforms and assembly algorithms with the repetitive yeast genome. Our comprehensive data indicated that sequencing reads from Oxford Nanopore with R7.3 flow cells generated more continuous assemblies than those derived from the PacBio Sequel, although the homopolymer-based assembly errors and chimeric contigs exist. In addition, the comparison between two second-generation sequencing platforms showed that Illumina NovaSeq 6000 provides more accurate and continuous assembly in the second-generation-sequencing-first pipeline, but MGI DNBSEQ-T7 provides a cheap and accurate read in the polishing process. Furthermore, our insight into the relationship among the computational time, read length, and coverage depth provided clues to the optimal pipelines of yeast assembly.
从测序数据中组装碎片化的全基因组信息是进一步进行全基因组研究的必然过程。然而,由于物种特异性的基因组特性,为未知物种选择合适的组装管道是复杂的。因此,我们的研究侧重于测序平台和组装算法相对更稳定的倾向,而不是易变的基因组序列。我们使用重复酵母基因组,在不同的测序平台和组装算法下构建了总共 212 个草案和精修从头组装。我们全面的数据表明,来自 Oxford Nanopore 带有 R7.3 流控细胞的测序reads 生成的连续组装比来自 PacBio Sequel 的更多,尽管存在基于同源多聚体的组装错误和嵌合 contigs。此外,两种第二代测序平台的比较表明,Illumina NovaSeq 6000 在第二代测序优先的管道中提供更准确和连续的组装,但 MGI DNBSEQ-T7 在抛光过程中提供了廉价而准确的读取。此外,我们对计算时间、读取长度和覆盖深度之间关系的深入了解为酵母组装的最佳管道提供了线索。