Brouard Céline, Bassé Antoine, d'Alché-Buc Florence, Rousu Juho
Unité de Mathématiques et Informatique Appliquées de Toulouse, UR 875, INRA, 31326 Castanet Tolosan, France.
LTCI, Télécom Paris, Institut Polytechnique de Paris, 75634 Paris, France.
Metabolites. 2019 Aug 1;9(8):160. doi: 10.3390/metabo9080160.
In small molecule identification from tandem mass (MS/MS) spectra, input-output kernel regression (IOKR) currently provides the state-of-the-art combination of fast training and prediction and high identification rates. The IOKR approach can be simply understood as predicting a fingerprint vector from the MS/MS spectrum of the unknown molecule, and solving a pre-image problem to find the molecule with the most similar fingerprint. In this paper, we bring forward the following improvements to the IOKR framework: firstly, we formulate the IOKRreverse model that can be understood as mapping molecular structures into the MS/MS feature space and solving a pre-image problem to find the molecule whose predicted spectrum is the closest to the input MS/MS spectrum. Secondly, we introduce an approach to combine several IOKR and IOKRreverse models computed from different input and output kernels, called IOKRfusion. The method is based on minimizing structured Hinge loss of the combined model using a mini-batch stochastic subgradient optimization. Our experiments show a consistent improvement of top-k accuracy both in positive and negative ionization mode data.
在从小分子串联质谱(MS/MS)谱图中识别分子的过程中,输入-输出核回归(IOKR)目前在快速训练与预测以及高识别率方面提供了最先进的组合。IOKR方法可以简单理解为从未知分子的MS/MS谱图预测一个指纹向量,并解决一个原像问题以找到具有最相似指纹的分子。在本文中,我们对IOKR框架提出了以下改进:首先,我们构建了IOKR反向模型,该模型可以理解为将分子结构映射到MS/MS特征空间,并解决一个原像问题以找到其预测谱图最接近输入MS/MS谱图的分子。其次,我们引入了一种方法,将从不同输入和输出核计算得到的多个IOKR和IOKR反向模型进行组合,称为IOKR融合。该方法基于使用小批量随机次梯度优化来最小化组合模型的结构化铰链损失。我们的实验表明,在正离子和负离子模式数据中,前k准确率都有持续提高。