Jauhal April A, Newcomb Richard D
School of Biological Sciences, University of Auckland, Auckland, New Zealand.
The New Zealand Institute for Plant & Food Research, Auckland, New Zealand.
Mol Ecol Resour. 2021 Jul;21(5):1416-1421. doi: 10.1111/1755-0998.13364. Epub 2021 Mar 9.
With the ever-increasing number of publicly available eukaryotic genome assemblies and user-friendly bioinformatics tools, there are increasing opportunities for researchers to use genomic resources in their research. While there are multiple dimensions to genome quality, it is often reduced to a single score that may not be correlated with other metrics, or appropriate for all applications of an assembly. To assess whether the commonly reported N50 value could reliably predict a separate dimension of genome quality, gene space completeness, we performed a meta-analysis of 611 published articles on eukaryotic genomes that used BUSCO scores, in addition to the typical N50 score. We found that although assemblies with relatively high contig and scaffold N50 values consistently had high BUSCO scores, a high BUSCO score could also be obtained from assemblies with a low N50. This reinforces that despite its ubiquity, N50 is not a perfect proxy for all measures of genome accuracy. Our data also suggests that variations in BUSCO scores among assemblies with poor N50 scores may be related to the number of introns in conserved eukaryotic genes. We stress the importance of screening and evaluating assembly quality based on the appropriate tools and urge increased reporting of additional genome assessment metrics in addition to N50. We also discuss the potential limitations of BUSCO and suggest improvements for assessing gene space within genome assemblies.
随着公开可用的真核生物基因组组装数量不断增加以及生物信息学工具愈发用户友好,研究人员在其研究中使用基因组资源的机会也越来越多。虽然基因组质量有多个维度,但它常常被简化为一个单一分数,该分数可能与其他指标不相关,或者不适用于组装的所有应用。为了评估常用的N50值是否能够可靠地预测基因组质量的一个单独维度——基因空间完整性,我们对611篇已发表的关于真核生物基因组的文章进行了荟萃分析,这些文章除了使用典型的N50分数外,还使用了BUSCO分数。我们发现,尽管具有相对较高的重叠群和支架N50值的组装通常具有较高的BUSCO分数,但低N50值的组装也能获得较高的BUSCO分数。这进一步证明,尽管N50无处不在,但它并非基因组准确性所有衡量指标的完美替代。我们的数据还表明,N50分数较低的组装之间BUSCO分数的差异可能与保守真核基因中的内含子数量有关。我们强调基于适当工具筛选和评估组装质量的重要性,并敦促除了N50之外增加报告其他基因组评估指标。我们还讨论了BUSCO的潜在局限性,并提出了评估基因组组装内基因空间的改进方法。