Biomedical Informatics Research Group, Department of Informatics, University of Oslo, PO Box 1080 Blindern, NO-0316, Oslo, Norway.
Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, PO Box 1066 Blindern, NO-0316, Oslo, Norway.
BMC Bioinformatics. 2020 Feb 21;21(1):66. doi: 10.1186/s12859-020-3414-0.
Advances in whole genome sequencing strategies have provided the opportunity for genomic and comparative genomic analysis of a vast variety of organisms. The analysis results are highly dependent on the quality of the genome assemblies used. Assessment of the assembly accuracy may significantly increase the reliability of the analysis results and is therefore of great importance.
Here, we present a new tool called NucBreak aimed at localizing structural errors in assemblies, including insertions, deletions, duplications, inversions, and different inter- and intra-chromosomal rearrangements. The approach taken by existing alternative tools is based on analysing reads that do not map properly to the assembly, for instance discordantly mapped reads, soft-clipped reads and singletons. NucBreak uses an entirely different and unique method to localise the errors. It is based on analysing the alignments of reads that are properly mapped to an assembly and exploit information about the alternative read alignments. It does not annotate detected errors. We have compared NucBreak with other existing assembly accuracy assessment tools, namely Pilon, REAPR, and FRCbam as well as with several structural variant detection tools, including BreakDancer, Lumpy, and Wham, by using both simulated and real datasets.
The benchmarking results have shown that NucBreak in general predicts assembly errors of different types and sizes with relatively high sensitivity and with lower false discovery rate than the other tools. Such a balance between sensitivity and false discovery rate makes NucBreak a good alternative to the existing assembly accuracy assessment tools and SV detection tools. NucBreak is freely available at https://github.com/uio-bmi/NucBreak under the MPL license.
全基因组测序策略的进步为对各种生物体进行基因组和比较基因组分析提供了机会。分析结果高度依赖于所使用的基因组组装的质量。评估组装的准确性可以显著提高分析结果的可靠性,因此具有重要意义。
在这里,我们提出了一种名为 NucBreak 的新工具,旨在定位组装中的结构错误,包括插入、缺失、重复、倒位和不同的染色体内和染色体间重排。现有替代工具所采用的方法是基于分析不能正确映射到组装的读取,例如不一致映射的读取、软剪辑读取和单读。NucBreak 使用一种完全不同且独特的方法来定位错误。它基于分析正确映射到组装的读取的比对,并利用有关替代读取比对的信息。它不会注释检测到的错误。我们使用模拟和真实数据集,将 NucBreak 与其他现有的组装准确性评估工具(即 Pilon、REAPR 和 FRCbam)以及几种结构变异检测工具(包括 BreakDancer、Lumpy 和 Wham)进行了比较。
基准测试结果表明,NucBreak 通常可以预测不同类型和大小的组装错误,其灵敏度相对较高,假阳性率低于其他工具。这种灵敏度和假阳性率之间的平衡使 NucBreak 成为现有组装准确性评估工具和 SV 检测工具的良好替代品。NucBreak 可在 https://github.com/uio-bmi/NucBreak 上免费获得,许可证为 MPL。