University of Michigan, Ann Arbor, USA.
BMC Bioinformatics. 2023 Mar 16;24(1):98. doi: 10.1186/s12859-023-05193-4.
Despite recent improvements in nanopore basecalling accuracy, germline variant calling of small insertions and deletions (INDELs) remains poor. Although precision and recall for single nucleotide polymorphisms (SNPs) now exceeds 99.5%, INDEL recall remains below 80% for standard R9.4.1 flow cells. We show that read phasing and realignment can recover a significant portion of false negative INDELs. In particular, we extend Needleman-Wunsch affine gap alignment by introducing new gap penalties for more accurately aligning repeated n-polymer sequences such as homopolymers ([Formula: see text]) and tandem repeats ([Formula: see text]). At the same precision, haplotype phasing improves INDEL recall from 63.76 to [Formula: see text] and nPoRe realignment improves it further to [Formula: see text].
尽管纳米孔碱基识别的准确性最近有所提高,但小插入和缺失(INDELs)的种系变异调用仍然很差。尽管单核苷酸多态性(SNPs)的准确率和召回率现在超过了 99.5%,但标准 R9.4.1 流动池的 INDEL 召回率仍低于 80%。我们表明,读段相位和重-align 可以恢复很大一部分假阴性 INDELs。特别是,我们通过引入新的间隙罚分来扩展 Needleman-Wunsch 仿射间隙对齐,以更准确地对齐重复的 n-聚合物序列,如同源聚合物([Formula: see text])和串联重复([Formula: see text])。在相同的精度下,单倍型相位提高了 INDEL 的召回率,从 63.76%提高到[Formula: see text],nPoRe 重-align 进一步提高了它,达到[Formula: see text]。