West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immu-Nology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, No. 88, Keyuan South Road, Hi-tech Zone, Chengdu, 610041, China.
Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
BMC Bioinformatics. 2020 Oct 7;21(1):439. doi: 10.1186/s12859-020-03783-0.
Mass spectrometry (MS) has become a promising analytical technique to acquire proteomics information for the characterization of biological samples. Nevertheless, most studies focus on the final proteins identified through a suite of algorithms by using partial MS spectra to compare with the sequence database, while the pattern recognition and classification of raw mass-spectrometric data remain unresolved.
We developed an open-source and comprehensive platform, named MSpectraAI, for analyzing large-scale MS data through deep neural networks (DNNs); this system involves spectral-feature swath extraction, classification, and visualization. Moreover, this platform allows users to create their own DNN model by using Keras. To evaluate this tool, we collected the publicly available proteomics datasets of six tumor types (a total of 7,997,805 mass spectra) from the ProteomeXchange consortium and classified the samples based on the spectra profiling. The results suggest that MSpectraAI can distinguish different types of samples based on the fingerprint spectrum and achieve better prediction accuracy in MS1 level (average 0.967).
This study deciphers proteome profiling of raw mass spectrometry data and broadens the promising application of the classification and prediction of proteomics data from multi-tumor samples using deep learning methods. MSpectraAI also shows a better performance compared to the other classical machine learning approaches.
质谱(MS)已成为一种很有前途的分析技术,可用于获取蛋白质组学信息以对生物样本进行特征分析。然而,大多数研究都集中在通过使用部分 MS 谱与序列数据库进行比较的一系列算法来最终确定蛋白质,而原始质谱数据的模式识别和分类仍然没有得到解决。
我们开发了一个名为 MSpectraAI 的开源且全面的平台,用于通过深度神经网络(DNN)分析大规模 MS 数据;该系统涉及光谱特征条带提取、分类和可视化。此外,该平台允许用户使用 Keras 创建自己的 DNN 模型。为了评估这个工具,我们从 ProteomeXchange 联盟收集了六个肿瘤类型的公开可用蛋白质组学数据集(共计 7,997,805 个质谱),并根据谱图分析对样本进行分类。结果表明,MSpectraAI 可以根据指纹谱区分不同类型的样本,并在 MS1 水平上实现更好的预测精度(平均 0.967)。
本研究揭示了原始质谱数据的蛋白质组特征,并通过深度学习方法拓宽了对多肿瘤样本的蛋白质组数据分类和预测的应用前景。与其他经典机器学习方法相比,MSpectraAI 也表现出更好的性能。