The School of Informatics, Computing, and Cyber Systems, Northern Arizona University, 1295 S Knoles Dr., Flagstaff, Arizona, 86001, USA.
Pathogen and Microbiome Institute, Northern Arizona University, 1395 S Knoles Dr., Flagstaff, Arizona, 86001, USA.
BMC Bioinformatics. 2018 Jun 11;19(1):222. doi: 10.1186/s12859-018-2225-z.
Targeted PCR amplicon sequencing (TAS) techniques provide a sensitive, scalable, and cost-effective way to query and identify closely related bacterial species and strains. Typically, this is accomplished by targeting housekeeping genes that provide resolution down to the family, genera, and sometimes species level. Unfortunately, this level of resolution is not sufficient in many applications where strain-level identification of bacteria is required (biodefense, forensics, clinical diagnostics, and outbreak investigations). Adding more genomic targets will increase the resolution, but the challenge is identifying the appropriate targets. VaST was developed to address this challenge by finding the minimum number of targets that, in combination, achieve maximum strain-level resolution for any strain complex. The final combination of target regions identified by the algorithm produce a unique haplotype for each strain which can be used as a fingerprint for identifying unknown samples in a TAS assay. VaST ensures that the targets have conserved primer regions so that the targets can be amplified in all of the known strains and it also favors the inclusion of targets with basal variants which makes the set more robust when identifying previously unseen strains.
We analyzed VaST's performance using a number of different pathogenic species that are relevant to human disease outbreaks and biodefense. The number of targets required to achieve full resolution ranged from 20 to 88% fewer sites than what would be required in the worst case and most of the resolution is achieved within the first 20 targets. We computationally and experimentally validated one of the VaST panels and found that the targets led to accurate phylogenetic placement of strains, even when the strains were not a part of the original panel design.
VaST is an open source software that, when provided a set of variant sites, can find the minimum number of sites that will provide maximum resolution of a strain complex, and it has many different run-time options that can accommodate a wide range of applications. VaST can be an effective tool in the design of strain identification panels that, when combined with TAS technologies, offer an efficient and inexpensive strain typing protocol.
靶向 PCR 扩增子测序 (TAS) 技术提供了一种敏感、可扩展且具有成本效益的方法来查询和识别密切相关的细菌物种和菌株。通常,这是通过靶向提供分辨率达到家族、属,有时甚至种水平的管家基因来完成的。不幸的是,在需要细菌菌株鉴定的许多应用中,这种分辨率是不够的(生物防御、法医、临床诊断和疫情调查)。添加更多的基因组靶标将提高分辨率,但挑战在于确定合适的靶标。VaST 的开发是为了解决这一挑战,通过找到组合起来为任何菌株复合物实现最大菌株分辨率的最小靶标数量。算法确定的目标区域的最终组合为每个菌株生成一个独特的单倍型,可用于在 TAS 测定中识别未知样本。VaST 确保目标具有保守的引物区域,以便在所有已知菌株中扩增目标,并且还倾向于包含具有基础变体的目标,这使得在识别以前未见的菌株时,该集合更健壮。
我们使用与人类疾病爆发和生物防御相关的许多不同致病性物种分析了 VaST 的性能。实现完全分辨率所需的目标数量比最糟糕情况下所需的目标数量少 20%至 88%,并且大部分分辨率在 20 个目标内实现。我们通过计算和实验验证了 VaST 面板之一,发现即使菌株不是原始面板设计的一部分,目标也能准确地进行菌株的系统发育定位。
VaST 是一个开源软件,当提供一组变体位点时,它可以找到提供菌株复合物最大分辨率的最小数量的位点,并且它有许多不同的运行时选项,可以适应广泛的应用。当与 TAS 技术结合使用时,VaST 可以成为菌株识别面板设计的有效工具,提供一种高效且廉价的菌株分型方案。