Liu Yadong, Jiang Tao, Gao Yan, Liu Bo, Zang Tianyi, Wang Yadong
Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, China.
Front Cell Dev Biol. 2021 Aug 13;9:731424. doi: 10.3389/fcell.2021.731424. eCollection 2021.
With the rapid development of short-read sequencing technologies, many population-scale resequencing studies have been carried out to study the associations between human genome variants and various phenotypes in recent years. Variant calling is one of the core bioinformatics tasks in such studies to comprehensively discover genomic variants in sequenced samples. Many efforts have been made to develop short read-based variant calling approaches; however, state-of-the-art tools are still computationally expensive. Meanwhile, cutting-edge genomics studies also have higher requirements on the yields of variant calling. Herein, we propose Partial-Order Alignment-based single nucleotide polymorphism (SNV) and Indel caller (Psi-caller), a lightweight variant calling algorithm that simultaneously achieves high performance and yield. Mainly, Psi-caller recognizes and divides the candidate variant site into three categories according to the complexity and location of the signatures and employs various methods including binomial model, partial-order alignment, and de Bruijn graph-based local assembly to handle various categories of candidate variant sites to call and genotype SNVs/Indels, respectively. Benchmarks on simulated and real short-read sequencing data sets demonstrate that Psi-caller is times faster than state-of-the-art tools with higher or equal sensitivity and accuracy. It has the potential to well handle large-scale data sets in cutting-edge genomics studies.
随着短读长测序技术的快速发展,近年来开展了许多群体规模的重测序研究,以探究人类基因组变异与各种表型之间的关联。变异检测是此类研究中的核心生物信息学任务之一,用于全面发现测序样本中的基因组变异。人们已做出诸多努力来开发基于短读长的变异检测方法;然而,目前最先进的工具在计算上仍然成本高昂。与此同时,前沿的基因组学研究对变异检测的产量也有更高要求。在此,我们提出了基于偏序比对的单核苷酸多态性(SNV)和插入缺失检测工具(Psi-caller),这是一种轻量级的变异检测算法,能同时实现高性能和高产量。主要而言,Psi-caller根据特征的复杂性和位置将候选变异位点识别并划分为三类,并采用包括二项式模型、偏序比对和基于德布鲁因图的局部组装等多种方法来处理各类候选变异位点,分别对SNV/插入缺失进行检测和基因分型。在模拟和真实的短读长测序数据集上的基准测试表明,Psi-caller比最先进的工具快数倍,且具有更高或相当的灵敏度和准确性。它有潜力很好地处理前沿基因组学研究中的大规模数据集。