Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan.
Department of Computer Science, Alato University, Espoo, Finland.
Bioinformatics. 2018 Jul 1;34(13):i323-i332. doi: 10.1093/bioinformatics/bty252.
Recent success in metabolite identification from tandem mass spectra has been led by machine learning, which has two stages: mapping mass spectra to molecular fingerprint vectors and then retrieving candidate molecules from the database. In the first stage, i.e. fingerprint prediction, spectrum peaks are features and considering their interactions would be reasonable for more accurate identification of unknown metabolites. Existing approaches of fingerprint prediction are based on only individual peaks in the spectra, without explicitly considering the peak interactions. Also the current cutting-edge method is based on kernels, which are computationally heavy and difficult to interpret.
We propose two learning models that allow to incorporate peak interactions for fingerprint prediction. First, we extend the state-of-the-art kernel learning method by developing kernels for peak interactions to combine with kernels for peaks through multiple kernel learning (MKL). Second, we formulate a sparse interaction model for metabolite peaks, which we call SIMPLE, which is computationally light and interpretable for fingerprint prediction. The formulation of SIMPLE is convex and guarantees global optimization, for which we develop an alternating direction method of multipliers (ADMM) algorithm. Experiments using the MassBank dataset show that both models achieved comparative prediction accuracy with the current top-performance kernel method. Furthermore SIMPLE clearly revealed individual peaks and peak interactions which contribute to enhancing the performance of fingerprint prediction.
The code will be accessed through http://mamitsukalab.org/tools/SIMPLE/.
最近,基于机器学习的串联质谱代谢产物鉴定取得了成功,它有两个阶段:将质谱映射到分子指纹向量,然后从数据库中检索候选分子。在第一阶段,即指纹预测中,谱峰是特征,如果考虑它们的相互作用,对于更准确地识别未知代谢物是合理的。现有的指纹预测方法仅基于谱中的单个峰,而没有明确考虑峰相互作用。此外,目前的最先进方法基于核,计算量大且难以解释。
我们提出了两种学习模型,允许为指纹预测纳入峰相互作用。首先,我们通过开发用于峰相互作用的核来扩展最先进的核学习方法,通过多核学习(MKL)将核与峰结合起来。其次,我们为代谢物峰制定了一个稀疏相互作用模型,我们称之为 SIMPLE,它用于指纹预测计算量轻且可解释。SIMPLE 的公式是凸的,并保证全局优化,我们为此开发了交替方向乘子法(ADMM)算法。使用 MassBank 数据集的实验表明,这两种模型都达到了与当前性能最佳核方法相当的预测精度。此外,SIMPLE 清楚地揭示了单个峰和峰相互作用,有助于提高指纹预测的性能。
代码将通过 http://mamitsukalab.org/tools/SIMPLE/ 访问。