English Adam C, Cunial Fabio, Metcalf Ginger A, Gibbs Richard A, Sedlazeck Fritz J
Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA.
Broad Institute of MIT and Harvard, Cambridge, MA, USA.
bioRxiv. 2024 Oct 25:2024.10.22.619642. doi: 10.1101/2024.10.22.619642.
Accurately genotyping structural variant (SV) alleles is crucial to genomics research. We present a novel method (kanpig) for genotyping SVs that leverages variant graphs and k-mer vectors to rapidly generate accurate SV genotypes. We benchmark kanpig against the latest SV benchmarks and show single-sample genotyping concordance of 82.1%, which is higher than existing genotypers averaging 66.3%. We explore kanpig's applicability to multi-sample projects by benchmarking project-level VCFs containing 47 genetically diverse samples and find kanpig accurately genotypes complex loci (e.g. SVs neighboring other SVs), achieving much higher genotyping concordance than other tools. Kanpig requires only 43 seconds to process a single sample's 20x long-reads and can be run on PacBio or ONT long-reads.
准确地对结构变异(SV)等位基因进行基因分型对于基因组学研究至关重要。我们提出了一种用于SV基因分型的新方法(kanpig),该方法利用变异图和k-mer向量来快速生成准确的SV基因型。我们将kanpig与最新的SV基准进行了比较,结果显示单样本基因分型一致性为82.1%,高于现有基因分型器平均66.3%的水平。我们通过对包含47个基因多样化样本的项目级VCF进行基准测试,探索了kanpig在多样本项目中的适用性,发现kanpig能够准确地对复杂位点(例如与其他SV相邻的SV)进行基因分型,其基因分型一致性远高于其他工具。Kanpig处理单个样本的20倍长读长仅需43秒,并且可以在PacBio或ONT长读长上运行。