BMC Bioinformatics. 2013;14 Suppl 14(Suppl 14):S15. doi: 10.1186/1471-2105-14-S14-S15. Epub 2013 Oct 9.
Pulsed field gel electrophoresis (PFGE) is currently the most widely and routinely used method by the Centers for Disease Control and Prevention (CDC) and state health labs in the United States for Salmonella surveillance and outbreak tracking. Major drawbacks of commercially available PFGE analysis programs have been their difficulty in dealing with large datasets and the limited availability of analysis tools. There exists a need to develop new analytical tools for PFGE data mining in order to make full use of valuable data in large surveillance databases.
In this study, a software package was developed consisting of five types of bioinformatics approaches exploring and implementing for the analysis and visualization of PFGE fingerprinting. The approaches include PFGE band standardization, Salmonella serotype prediction, hierarchical cluster analysis, distance matrix analysis and two-way hierarchical cluster analysis. PFGE band standardization makes it possible for cross-group large dataset analysis. The Salmonella serotype prediction approach allows users to predict serotypes of Salmonella isolates based on their PFGE patterns. The hierarchical cluster analysis approach could be used to clarify subtypes and phylogenetic relationships among groups of PFGE patterns. The distance matrix and two-way hierarchical cluster analysis tools allow users to directly visualize the similarities/dissimilarities of any two individual patterns and the inter- and intra-serotype relationships of two or more serotypes, and provide a summary of the overall relationships between user-selected serotypes as well as the distinguishable band markers of these serotypes. The functionalities of these tools were illustrated on PFGE fingerprinting data from PulseNet of CDC.
The bioinformatics approaches included in the software package developed in this study were integrated with the PFGE database to enhance the data mining of PFGE fingerprints. Fast and accurate prediction makes it possible to elucidate Salmonella serotype information before conventional serological methods are pursued. The development of bioinformatics tools to distinguish the PFGE markers and serotype specific patterns will enhance PFGE data retrieval, interpretation and serotype identification and will likely accelerate source tracking to identify the Salmonella isolates implicated in foodborne diseases.
脉冲场凝胶电泳(PFGE)是目前美国疾病控制与预防中心(CDC)和州立卫生实验室用于沙门氏菌监测和疫情追踪的最广泛和常规使用的方法。商业可用的 PFGE 分析程序主要存在两个缺点,即难以处理大数据集和分析工具的可用性有限。为了充分利用大型监测数据库中的有价值数据,需要开发新的 PFGE 数据分析工具。
本研究开发了一个软件包,其中包含五种类型的生物信息学方法,用于探索和实施 PFGE 指纹图谱的分析和可视化。这些方法包括 PFGE 带标准化、沙门氏菌血清型预测、层次聚类分析、距离矩阵分析和双向层次聚类分析。PFGE 带标准化使得跨组大数据集分析成为可能。沙门氏菌血清型预测方法允许用户根据 PFGE 图谱预测沙门氏菌分离株的血清型。层次聚类分析方法可用于澄清 PFGE 图谱组之间的亚型和系统发育关系。距离矩阵和双向层次聚类分析工具允许用户直接可视化任意两个个体图谱之间的相似性/差异性以及两个或更多血清型之间的血清型内和血清型间关系,并提供用户选择的血清型之间的整体关系以及这些血清型的可区分带标记的摘要。在 CDC 的脉冲网的 PFGE 指纹图谱数据上演示了这些工具的功能。
本研究开发的软件包中包含的生物信息学方法与 PFGE 数据库集成,以增强 PFGE 指纹图谱的数据挖掘。快速准确的预测使得在采用传统血清学方法之前阐明沙门氏菌血清型信息成为可能。开发区分 PFGE 标记和血清型特异性图谱的生物信息学工具将增强 PFGE 数据检索、解释和血清型识别,并可能加速溯源以识别与食源性疾病相关的沙门氏菌分离株。