Ben-Gurion University of the Negev, Beer-Sheva, Israel.
Public Health Services, Ministry of Health, Jerusalem, Israel.
Clin Microbiol Infect. 2017 May;23(5):306-310. doi: 10.1016/j.cmi.2017.01.002. Epub 2017 Jan 7.
Whole genome sequencing (WGS) has revolutionized the subtyping of Legionella pneumophila but calling the traditional sequence-based type from genomic data is hampered by multiple copies of the mompS locus. We propose a novel bioinformatics solution for rectifying that limitation, ensuring the feasibility of WGS for cluster investigation.
We designed a novel approach based on the alignment of raw reads with a reference sequence. With WGS, reads originating from either of the two mompS copies cannot be differentiated. Therefore, when non-identical copies were present, we applied a read-filtering strategy based on read alignment to a reference sequence via unique 'anchors'. If minimal read coverage was achieved after filtration (≥3X), a consensus sequence was built based on mapped reads followed by calling the sequence-based typing allele. The entire procedure was implemented using a Perl script.
The method was validated using a diverse sample of 265 L. pneumophila genomes, consisting of 59 different sequence types (STs) and 23 mompS variants; 57 of the 265 (22%) had non-identical mompS copies. In 237 of the 265 samples (89.4%), mompS calling was successful and no erroneous calling occurred. A 98.1% success was recorded among 109 samples meeting quality requirements. The method was superior to alternative approaches.
As WGS becomes more accessible, technical difficulties in routine clinical and surveillance work will arise. The case of mompS in L. pneumophila serves as an example for such limitations that necessitate the development of novel computational solutions that meet end-user demands.
全基因组测序(WGS)彻底改变了嗜肺军团菌的亚型分类,但由于 mompS 基因座的多个拷贝,从基因组数据中调用传统的基于序列的分型受到阻碍。我们提出了一种新的生物信息学解决方案来纠正这一限制,确保 WGS 用于聚类调查的可行性。
我们设计了一种基于原始读数与参考序列比对的新方法。由于 WGS 无法区分源自两个 mompS 拷贝中的任何一个的读数。因此,当存在非相同拷贝时,我们应用了一种基于读数比对的过滤策略,通过独特的“锚点”应用于参考序列。如果过滤后(≥3X)达到最小读数覆盖,则根据映射读数构建共识序列,然后调用基于序列的分型等位基因。整个过程使用 Perl 脚本实现。
该方法使用包含 59 种不同序列型(ST)和 23 种 mompS 变体的 265 株嗜肺军团菌基因组的多样化样本进行了验证;265 株中的 57 株(22%)具有非相同的 mompS 拷贝。在 265 个样本中的 237 个(89.4%)中,mompS 调用成功,且没有错误调用。在满足质量要求的 109 个样本中,成功率为 98.1%。该方法优于替代方法。
随着 WGS 变得更加普及,常规临床和监测工作中将会出现技术难题。嗜肺军团菌中 mompS 的情况就是这种限制的一个例子,需要开发满足最终用户需求的新型计算解决方案。