Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Pde, Parkville, VIC, 3052, Australia.
Department of Medical Biology, University of Melbourne, Parkville, VIC, 3010, Australia.
Nat Commun. 2019 Jul 19;10(1):3240. doi: 10.1038/s41467-019-11146-4.
In recent years, many software packages for identifying structural variants (SVs) using whole-genome sequencing data have been released. When published, a new method is commonly compared with those already available, but this tends to be selective and incomplete. The lack of comprehensive benchmarking of methods presents challenges for users in selecting methods and for developers in understanding algorithm behaviours and limitations. Here we report the comprehensive evaluation of 10 SV callers, selected following a rigorous process and spanning the breadth of detection approaches, using high-quality reference cell lines, as well as simulations. Due to the nature of available truth sets, our focus is on general-purpose rather than somatic callers. We characterise the impact on performance of event size and type, sequencing characteristics, and genomic context, and analyse the efficacy of ensemble calling and calibration of variant quality scores. Finally, we provide recommendations for both users and methods developers.
近年来,许多使用全基因组测序数据识别结构变异(SV)的软件包已经发布。当一种新方法发布时,通常会与已经存在的方法进行比较,但这种比较往往是有选择性的且不完整的。缺乏对方法的全面基准测试,使用户在选择方法时面临挑战,也使开发人员难以理解算法的行为和局限性。在这里,我们报告了 10 个 SV 调用者的综合评估,这些调用者是根据严格的流程选择的,涵盖了广泛的检测方法,使用了高质量的参考细胞系和模拟数据。由于可用真值集的性质,我们的重点是通用而不是体细胞调用者。我们描述了事件大小和类型、测序特征和基因组背景对性能的影响,并分析了组合调用和变异质量得分校准的效果。最后,我们为用户和方法开发人员提供了建议。