Gaston Jeffry M, Alm Eric J, Zhang An-Ni
Google, Cambridge, USA.
Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, USA.
BMC Biol. 2024 Apr 22;22(1):90. doi: 10.1186/s12915-024-01891-4.
Accurate identification of genetic variants, such as point mutations and insertions/deletions (indels), is crucial for various genetic studies into epidemic tracking, population genetics, and disease diagnosis. Genetic studies into microbiomes often require processing numerous sequencing datasets, necessitating variant identifiers with high speed, accuracy, and robustness.
We present QuickVariants, a bioinformatics tool that effectively summarizes variant information from read alignments and identifies variants. When tested on diverse bacterial sequencing data, QuickVariants demonstrates a ninefold higher median speed than bcftools, a widely used variant identifier, with higher accuracy in identifying both point mutations and indels. This accuracy extends to variant identification in virus samples, including SARS-CoV-2, particularly with significantly fewer false negative indels than bcftools. The high accuracy of QuickVariants is further demonstrated by its detection of a greater number of Omicron-specific indels (5 versus 0) and point mutations (61 versus 48-54) than bcftools in sewage metagenomes predominated by Omicron variants. Much of the reduced accuracy of bcftools was attributable to its misinterpretation of indels, often producing false negative indels and false positive point mutations at the same locations.
We introduce QuickVariants, a fast, accurate, and robust bioinformatics tool designed for identifying genetic variants for microbial studies. QuickVariants is available at https://github.com/caozhichongchong/QuickVariants .
准确识别基因变异,如点突变和插入/缺失(indel),对于疫情追踪、群体遗传学和疾病诊断等各种基因研究至关重要。微生物组的基因研究通常需要处理大量测序数据集,因此需要具有高速度、准确性和稳健性的变异识别工具。
我们展示了QuickVariants,这是一种生物信息学工具,可有效总结来自读段比对的变异信息并识别变异。在多种细菌测序数据上进行测试时,QuickVariants的中位速度比广泛使用的变异识别工具bcftools快九倍,在识别点突变和indel方面具有更高的准确性。这种准确性也适用于病毒样本(包括SARS-CoV-2)中的变异识别,特别是与bcftools相比,假阴性indel明显更少。在以奥密克戎变异为主的污水宏基因组中,QuickVariants检测到的奥密克戎特异性indel(5个对0个)和点突变(61个对48 - 54个)比bcftools更多,进一步证明了其高准确性。bcftools准确性降低的很大一部分原因是其对indel的错误解读,经常在相同位置产生假阴性indel和假阳性点突变。
我们推出了QuickVariants,这是一种快速、准确且稳健的生物信息学工具,专为微生物研究中的基因变异识别而设计。QuickVariants可在https://github.com/caozhichongchong/QuickVariants获取。