Starostin Konstantin V, Demidov Evgeny A, Ershov Nikita I, Bryanskaya Alla V, Efimov Vadim M, Shlyakhtun Valeriya N, Peltek Sergey E
Laboratory of Molecular Biotechnologies of Federal Research Center Institute of Cytology and Genetics of The Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.
Kurchatov Genomics Center of Federal Research Center Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.
Front Microbiol. 2020 Dec 18;11:609033. doi: 10.3389/fmicb.2020.609033. eCollection 2020.
Identification of microorganisms by MALDI-TOF mass spectrometry is a very efficient method with high throughput, speed, and accuracy. However, it is significantly limited by the absence of a universal database of reference mass spectra. This problem can be solved by creating an Internet platform for open databases of protein spectra of microorganisms. Choosing the optimal mathematical apparatus is the pivotal issue for this task. In our previous study we proposed the geometric approach for processing mass spectrometry data, which represented a mass spectrum as a vector in a multidimensional Euclidean space. This algorithm was implemented in a Jacob4 stand-alone package. We demonstrated its efficiency in delimiting two closely related species of the group. In this study, the geometric approach was realized as R scripts which allowed us to design a Web-based application. We also studied the possibility of using full spectra analysis (FSA) without calculating mass peaks (PPA), which is the logical development of the method. We used 74 microbial strains from the collections of ICiG SB RAS, UNIQEM, IEGM, KMM, and VGM as the models. We demonstrated that the algorithms based on peak-picking and analysis of complete data have accuracy no less than that of Biotyper 3.1 software. We proposed a method for calculating cut-off thresholds based on averaged intraspecific distances. The resulting database, raw data, and the set of R scripts are available online at https://icg-test.mydisk.nsc.ru/s/qj6cfZg57g6qwzN.
通过基质辅助激光解吸电离飞行时间质谱法鉴定微生物是一种高效的方法,具有高通量、速度快和准确性高的特点。然而,由于缺乏通用的参考质谱数据库,该方法受到了显著限制。创建一个微生物蛋白质谱开放数据库的互联网平台可以解决这个问题。选择最佳的数学工具是完成这项任务的关键问题。在我们之前的研究中,我们提出了一种处理质谱数据的几何方法,即将质谱表示为多维欧几里得空间中的向量。该算法在Jacob4独立软件包中实现。我们证明了它在区分该组中两个密切相关物种方面的有效性。在本研究中,几何方法被实现为R脚本,这使我们能够设计一个基于网络的应用程序。我们还研究了不计算质量峰(PPA)而使用全谱分析(FSA)的可能性,这是该方法的合理发展。我们使用了来自俄罗斯科学院西伯利亚分院微生物研究所、UNIQEM、IEGM、KMM和VGM菌种保藏中心的74株微生物菌株作为模型。我们证明了基于峰挑选和完整数据分析的算法的准确性不低于Biotyper 3.1软件。我们提出了一种基于平均种内距离计算截止阈值的方法。所得数据库、原始数据和R脚本集可在https://icg-test.mydisk.nsc.ru/s/qj6cfZg57g6qwzN在线获取。