Dittwald Piotr, Nghia Vu Trung, Harris Glenn A, Caprioli Richard M, Van de Plas Raf, Laukens Kris, Gambin Anna, Valkenborg Dirk
College of Inter-faculty Individual Studies in Mathematics and Natural Sciences, University of Warsaw, Warsaw, Poland ; Institute of Informatics, University of Warsaw, Warsaw, Poland.
Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium ; Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium.
EuPA Open Proteom. 2014 Sep 1;4:87-100. doi: 10.1016/j.euprot.2014.05.002.
Although physicochemical fractionation techniques play a crucial role in the analysis of complex mixtures, they are not necessarily the best solution to separate specific molecular classes, such as lipids and peptides. Any physical fractionation step such as, for example, those based on liquid chromatography, will introduce its own variation and noise. In this paper we investigate to what extent the high sensitivity and resolution of contemporary mass spectrometers offers viable opportunities for computational separation of signals in full scan spectra. We introduce an automatic method that can discriminate peptide from lipid peaks in full scan mass spectra, based on their isotopic properties. We systematically evaluate which features maximally contribute to a peptide versus lipid classification. The selected features are subsequently used to build a random forest classifier that enables almost perfect separation between lipid and peptide signals without requiring ion fragmentation and classical tandem MS-based identification approaches. The classifier is trained on data, but is also capable of discriminating signals in real world experiments. We evaluate the influence of typical data inaccuracies of common classes of mass spectrometry instruments on the optimal set of discriminant features. Finally, the method is successfully extended towards the classification of individual lipid classes from full scan mass spectral features, based on input data defined by the Lipid Maps Consortium.
尽管物理化学分离技术在复杂混合物分析中起着关键作用,但它们不一定是分离特定分子类别(如脂质和肽)的最佳解决方案。任何物理分离步骤,例如基于液相色谱的步骤,都会引入其自身的变化和噪声。在本文中,我们研究当代质谱仪的高灵敏度和分辨率在多大程度上为全扫描光谱中的信号计算分离提供了可行的机会。我们引入了一种自动方法,该方法可以根据肽和脂质峰的同位素特性在全扫描质谱中区分它们。我们系统地评估哪些特征对肽与脂质分类的贡献最大。随后,所选特征被用于构建随机森林分类器,该分类器能够在不需要离子碎裂和基于经典串联质谱的鉴定方法的情况下,几乎完美地分离脂质和肽信号。该分类器在数据上进行训练,但也能够在实际实验中区分信号。我们评估了常见类型质谱仪的典型数据不准确性对最佳判别特征集的影响。最后,基于脂质图谱联盟定义的输入数据,该方法成功扩展到从全扫描质谱特征对单个脂质类别进行分类。