Burger Nicolaas F V, Nicolis Vittorio F, Botha Anna-Maria
Van der Byl Street, Genetics Department, JC Smuts Building, Faculty of AgriScience, Stellenbosch University, Stellenbosch, South Africa.
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf105.
Aphids are a speciose family of the Hemiptera compromising >5500 species. They have adapted to feed off multiple plant species and occur on every continent on Earth. Although economically devastating, very few aphid genomes have been sequenced and assembled, and those that have suffer low contiguity due to repeat-rich and AT-rich genomes. With third-generation sequencing becoming more affordable and approaching quality levels to that of second-generation sequencing, the ability to produce more contiguous aphid genome assemblies is becoming a reality. With a growing list of long-read assemblers becoming available, the choice of which assembly tool to use becomes more complicated. In this study, six recently released long-read assemblers (Canu, Flye, Hifiasm, Mecat2, Raven, and Wtdbg2) were evaluated on several quality and contiguity metrics after assembling four populations (or biotypes) of the same species (Russian wheat aphid, Diuraphis noxia) and two unrelated aphid species that have publicly available long-read sequences. All assemblers did not fare equally well between the different read sets, but, overall, the Hifiasm and Canu assemblers performed the best. Merging of the best assemblies for each read set was also performed using quickmerge, where, in some cases, it resulted in superior assemblies and, in others, introduced more errors. Ab initio gene calling between assemblies of the same read set also showed surprisingly less similarity than expected. Overall, the quality control pipeline followed during the assembly resulted in chromosome-level assemblies with minimal structural or quality artefacts.
蚜虫是半翅目一个种类繁多的科,包含5500多种。它们已适应以多种植物为食,遍布地球上的每一个大陆。尽管蚜虫在经济上具有极大的破坏力,但已测序和组装的蚜虫基因组却非常少,而且由于基因组富含重复序列和AT,那些已测序的基因组的连续性很低。随着第三代测序成本越来越低,质量也接近第二代测序,生成连续性更高的蚜虫基因组组装体的能力正在成为现实。随着越来越多的长读长组装工具可供使用,选择使用哪种组装工具变得更加复杂。在本研究中,在对同一物种(俄罗斯小麦蚜虫,麦二叉蚜)的四个种群(或生物型)以及两个具有公开可用长读长序列的不相关蚜虫物种进行组装后,根据几个质量和连续性指标对六个最近发布的长读长组装工具(Canu、Flye、Hifiasm、Mecat2、Raven和Wtdbg2)进行了评估。所有组装工具在不同的读段集之间表现并不相同,但总体而言,Hifiasm和Canu组装工具表现最佳。还使用quickmerge对每个读段集的最佳组装结果进行了合并,在某些情况下,这产生了更好的组装结果,而在其他情况下,则引入了更多错误。同一读段集组装结果之间的从头基因预测也显示出比预期少得多的相似性。总体而言,组装过程中遵循的质量控制流程产生了具有最小结构或质量假象的染色体水平组装体。