Division of Personalized Nutrition and Medicine, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, USA.
J Clin Microbiol. 2012 May;50(5):1524-32. doi: 10.1128/JCM.00111-12. Epub 2012 Feb 29.
A classification model is presented for rapid identification of Salmonella serotypes based on pulsed-field gel electrophoresis (PFGE) fingerprints. The classification model was developed using random forest and support vector machine algorithms and was then applied to a database of 45,923 PFGE patterns, randomly selected from all submissions to CDC PulseNet from 2005 to 2010. The patterns selected included the top 20 most frequent serotypes and 12 less frequent serotypes from various sources. The prediction accuracies for the 32 serotypes ranged from 68.8% to 99.9%, with an overall accuracy of 96.0% for the random forest classification, and ranged from 67.8% to 100.0%, with an overall accuracy of 96.1% for the support vector machine classification. The prediction system improves reliability and accuracy and provides a new tool for early and fast screening and source tracking of outbreak isolates. It is especially useful to get serotype information before the conventional methods are done. Additionally, this system also works well for isolates that are serotyped as "unknown" by conventional methods, and it is useful for a laboratory where standard serotyping is not available.
本文提出了一种基于脉冲场凝胶电泳(PFGE)指纹图谱的沙门氏菌血清型快速鉴定分类模型。该分类模型采用随机森林和支持向量机算法开发,并应用于从 2005 年至 2010 年 CDC PulseNet 所有提交的 PFGE 模式中随机选择的 45923 个数据库。选择的模式包括来自不同来源的最常见的 20 种血清型和 12 种较少见的血清型。32 种血清型的预测准确率范围为 68.8%至 99.9%,随机森林分类的总体准确率为 96.0%,支持向量机分类的预测准确率范围为 67.8%至 100.0%,总体准确率为 96.1%。该预测系统提高了可靠性和准确性,为暴发分离株的早期快速筛选和溯源提供了新工具。在常规方法完成之前,它尤其有助于获得血清型信息。此外,该系统对于常规方法鉴定为“未知”的分离株也有很好的效果,对于没有标准血清分型的实验室也很有用。