Department of Chemistry, Shanghai Stomatological Hospital , Fudan University , Shanghai 200000 , China.
Department of Pharmacy , University of Groningen , 9700 AD Groningen , The Netherlands.
J Proteome Res. 2018 Jun 1;17(6):2124-2130. doi: 10.1021/acs.jproteome.8b00065. Epub 2018 May 21.
Bacterial identification is of great importance in clinical diagnosis, environmental monitoring, and food safety control. Among various strategies, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has drawn significant interest and has been clinically used. Nevertheless, current bioinformatics solutions use spectral libraries for the identification of bacterial strains. Spectral library generation requires acquisition of MALDI-TOF spectra from monoculture bacterial colonies, which is time-consuming and not possible for many species and strains. We propose a strategy for bacterial typing by MALDI-TOF using protein sequences from public database, that is, UniProt. Ten genes were identified to encode proteins most often observed by MALD-TOF from bacteria through 500 times repeated a 10-fold double cross-validation procedure, using 403 MALDI-TOF spectra corresponding to 14 genera, 81 species, and 403 strains, and the protein sequences of 1276 species in UniProt. The 10 genes were then used to annotate peaks on MALDI-TOF spectra of bacteria for bacterial identification. With the approach, bacteria can be identified at the genus level by searching against a database containing the protein sequences of 42 genera of bacteria from UniProt. Our approach identified 84.1% of the 403 spectra correctly at the genus level. Source code of the algorithm is available at https://github.com/dipcarbon/BacteriaMSLF .
细菌鉴定在临床诊断、环境监测和食品安全控制中具有重要意义。在各种策略中,基质辅助激光解吸/电离飞行时间质谱(MALDI-TOF MS)引起了广泛关注,并已在临床上得到应用。然而,目前的生物信息学解决方案使用光谱库来鉴定细菌菌株。光谱库的生成需要从单培养细菌菌落中获取 MALDI-TOF 光谱,这既耗时又不适用于许多物种和菌株。我们提出了一种使用公共数据库(即 UniProt)中的蛋白质序列对 MALDI-TOF 进行细菌分型的策略。通过对 500 次 10 倍交叉验证过程中从细菌中最常通过 MALDI-TOF 观察到的蛋白质进行重复,确定了 10 个基因,使用了 403 个 MALDI-TOF 光谱,对应于 14 个属、81 个种和 403 个菌株,以及 UniProt 中的 1276 个种的蛋白质序列。然后,这 10 个基因用于注释 MALDI-TOF 细菌光谱上的峰,以进行细菌鉴定。通过该方法,可以通过搜索包含来自 UniProt 的 42 个属的细菌蛋白质序列的数据库来鉴定细菌的属。该方法在属水平上正确识别了 403 个光谱中的 84.1%。算法的源代码可在 https://github.com/dipcarbon/BacteriaMSLF 上获得。