Slabbinck Bram, De Baets Bernard, Dawyndt Peter, De Vos Paul
Research Unit Knowledge-based Systems, Faculty of Bioscience Engineering, Ghent University, Coupure links 653, 9000 Ghent, Belgium.
Syst Appl Microbiol. 2009 May;32(3):163-76. doi: 10.1016/j.syapm.2009.01.003. Epub 2009 Feb 23.
In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species identification strategy.
在过去十年中,细菌分类学经历了巨大的扩展。细菌物种(重新)定义的快速步伐对一线鉴定方法的准确性和完整性产生了严重影响。因此,后端鉴定库需要与《原核生物有效名称名录》保持同步。在本研究中,我们专注于细菌脂肪酸甲酯(FAME)谱分析,这是一种广泛使用的一线鉴定方法。我们从BAME@LMG数据库中选择了芽孢杆菌属、类芽孢杆菌属和假单胞菌属单个菌株的FAME谱。仅保留了标准生长条件下产生的那些谱。相应的数据集分别涵盖74、44和95个有效发表的细菌物种,由961、378和1673个标准FAME谱表示。通过在监督策略中应用机器学习技术,建立了用于属和种鉴定的不同计算模型。考虑了三种技术:人工神经网络、随机森林和支持向量机。在属水平上实现了近乎完美的鉴定。尽管已知FAME分析对种鉴定的鉴别力有限,但计算模型对这三个属都取得了良好的种鉴定结果。对于芽孢杆菌属、类芽孢杆菌属和假单胞菌属,随机森林分别产生的灵敏度值为0.847、0.901和0.708。随机森林模型优于其他机器学习技术的模型。此外,我们的机器学习方法也优于Sherlock MIS(美国特拉华州纽瓦克市MIDI公司)。这些结果表明,机器学习被证明对基于FAME的细菌物种鉴定非常有用。除了在种水平上有良好的细菌鉴定效果外,速度和分类同步的简便性是这种计算物种鉴定策略的主要优点。