Mu Peizheng, Feng Xiangyan, Tong Lanxin, Huang Jie, Zhu Chaoqun, Wang Fei, Quan Wei, Ma Yuanjun, Dong Yucui, Zhu Xiao
School of Computer and Control Engineering, Yantai University, Yantai, Shandong 264005, China.
Department of Hematology, Yantai Yuhuangding Hospital Affiliated to Qingdao University, Yantai, Shandong 264009, China.
Comput Struct Biotechnol J. 2025 Jun 29;27:2851-2862. doi: 10.1016/j.csbj.2025.06.045. eCollection 2025.
Accurate benchmarking of structural variant (SV) detection is essential for advancing the development and application of human whole-genome sequencing (WGS). A fundamental challenge in benchmarking SV detection results is determining whether two SVs represent the same event. Differences in the variation-awareness and strategic implementation of aligners inherently constrain SV detection algorithms that rely on alignment-based approaches. Traditional benchmarking, which primarily focuses on comparing and matching individual variants, makes it difficult to capture the relationships between multiple adjacent variants. We introduced ASVBM, an improved benchmarking framework that introduces the notion of latent positives and leverages a joint analysis and validation strategy based on local variants. This performance improvement arose from the discovery that multiple smaller variants are nearly equivalent to a larger variant. We comprehensively evaluated the performance of six state-of-the-art variant calling pipelines using real WGS datasets. According to multiple matching criteria, ASVBM employs a joint analysis strategy to uncover potential equivalences between the callset and the benchmark set, thereby reducing false mismatches caused by differences in variant representation. ASVBM is available at https://github.com/zhuxiao/asvbm.
准确的结构变异(SV)检测基准对于推动人类全基因组测序(WGS)的发展和应用至关重要。基准化SV检测结果的一个基本挑战是确定两个SV是否代表同一事件。比对器在变异感知和策略实施方面的差异本质上限制了依赖基于比对方法的SV检测算法。传统的基准测试主要侧重于比较和匹配单个变异,难以捕捉多个相邻变异之间的关系。我们引入了ASVBM,这是一个改进的基准测试框架,引入了潜在阳性的概念,并利用基于局部变异的联合分析和验证策略。这种性能提升源于发现多个较小的变异几乎等同于一个较大的变异。我们使用真实的WGS数据集全面评估了六种最先进的变异调用流程的性能。根据多个匹配标准,ASVBM采用联合分析策略来揭示调用集和基准集之间的潜在等效性,从而减少由变异表示差异导致的错误错配。ASVBM可在https://github.com/zhuxiao/asvbm获取。