一种基于支持向量机（SVM）学习模型的液相色谱-质谱联用（LC-MS）中同位素分布模式匹配算法。

A matching algorithm with isotope distribution pattern in LC-MS based on support vector machine (SVM) learning model.

作者信息

Cui Jian, Chen Qiang, Dong Xiaorui, Shang Kai, Qi Xin, Cui Hao

机构信息

Department of Information Technology Shengli College, China University of Petroleum Huadong BeiEr Road #271 Dongying Shandong P. R. China

Department of Computer Science in College of Computer and Communication Engineering, China University of Petroleum Huadong Western Changjiang Road #66, Huangdao District Qingdao Shandong P. R. China.

出版信息

RSC Adv. 2019 Sep 4;9(48):27874-27882. doi: 10.1039/c9ra03789f. eCollection 2019 Sep 3.

DOI:10.1039/c9ra03789f

PMID:35530479

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9071103/

Abstract

In proteomics, it is important to detect, analyze, and quantify complex peptide components and differences. The key is to match the elution time peaks (LC peaks) produced by the same peptide in replicate experiments. Warping functions are currently widely used to correct the mean of time shifts among replicates. However, they cannot reduce the ambiguity to distinguish the corresponding peak pairs and the non-corresponding ones because the time shifts are random based on each extracted-ion-chromatogram (XIC). In this paper, besides time feature, isotope distribution pattern similarity is considered. The novelty is that compared with other feature based methods including the isotope feature, the algorithm is not based on the peak profile similarity as usual, but on the isotope distribution similarity. First, the training set of peptides including the corresponding and non-corresponding peak pairs were selected from the MS/MS results. Second, we generated time difference and isotope distribution pattern similarities for each peak pair. Third, Support Vector Machine (SVM) classification was used based on the two features. Finally, the accuracy was measured along with final coverage. We first used a 10-fold cross validation to test the effectiveness of the SVM learning model. The accuracy of correct matching could reach 97%. Second, we evaluated the coverage based on the learning model, which could be from 75% to 91% in different datasets. Thus, this matching algorithm based on time and isotope distribution pattern features could provide a high accuracy and coverage for the corresponding peak identification.

摘要

在蛋白质组学中，检测、分析和量化复杂的肽成分及其差异非常重要。关键在于在重复实验中匹配同一肽产生的洗脱时间峰（液相色谱峰）。扭曲函数目前被广泛用于校正重复实验之间的时间偏移均值。然而，由于基于每个提取离子色谱图（XIC）的时间偏移是随机的，它们无法减少区分相应峰对和非相应峰对的模糊性。在本文中，除了时间特征外，还考虑了同位素分布模式相似性。新颖之处在于，与包括同位素特征在内的其他基于特征的方法相比，该算法不像通常那样基于峰轮廓相似性，而是基于同位素分布相似性。首先，从MS/MS结果中选择包括相应峰对和非相应峰对的肽训练集。其次，我们为每个峰对生成时间差和同位素分布模式相似性。第三，基于这两个特征使用支持向量机（SVM）分类。最后，测量准确率并计算最终覆盖率。我们首先使用10折交叉验证来测试SVM学习模型的有效性。正确匹配的准确率可达97%。其次，我们基于学习模型评估覆盖率，在不同数据集中覆盖率可达75%至91%。因此，这种基于时间和同位素分布模式特征的匹配算法可为相应峰的识别提供高精度和高覆盖率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee49/9071103/2be1facc4c5c/c9ra03789f-f1.jpg

相似文献

A matching algorithm with isotope distribution pattern in LC-MS based on support vector machine (SVM) learning model.一种基于支持向量机（SVM）学习模型的液相色谱-质谱联用（LC-MS）中同位素分布模式匹配算法。

RSC Adv. 2019 Sep 4;9(48):27874-27882. doi: 10.1039/c9ra03789f. eCollection 2019 Sep 3.

SCFIA: a statistical corresponding feature identification algorithm for LC/MS.SCFIA：一种用于 LC/MS 的统计对应特征识别算法。

BMC Bioinformatics. 2011 Nov 11;12:439. doi: 10.1186/1471-2105-12-439.

PeakLink: a new peptide peak linking method in LC-MS/MS using wavelet and SVM.PeakLink：一种基于小波和支持向量机的液相色谱-串联质谱中新的肽峰连接方法。

Bioinformatics. 2014 Sep 1;30(17):2464-70. doi: 10.1093/bioinformatics/btu299. Epub 2014 May 9.

Shape-based feature matching improves protein identification via LC-MS and tandem MS.基于形状的特征匹配通过液相色谱-质谱联用和串联质谱提高蛋白质鉴定水平。

J Comput Biol. 2011 Apr;18(4):547-57. doi: 10.1089/cmb.2010.0155. Epub 2011 Mar 21.

Feature-matching pattern-based support vector machines for robust peptide mass fingerprinting.基于特征匹配模式的支持向量机在肽质量指纹图谱分析中的稳健性

Mol Cell Proteomics. 2011 Dec;10(12):M110.005785. doi: 10.1074/mcp.M110.005785. Epub 2011 Jul 20.

Improve accuracy and sensibility in glycan structure prediction by matching glycan isotope abundance.通过匹配聚糖同位素丰度来提高聚糖结构预测的准确性和灵敏度。

Anal Chim Acta. 2012 Sep 19;743:80-9. doi: 10.1016/j.aca.2012.07.009. Epub 2012 Jul 16.

Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.基于多类支持向量机的中医唇诊计算机辅助诊断。

BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.

The Classification of Rice Blast Resistant Seed Based on Ranman Spectroscopy and SVM.基于 Raman 光谱和支持向量机的水稻抗瘟种子分类。

Molecules. 2022 Jun 25;27(13):4091. doi: 10.3390/molecules27134091.

Targeted Feature Detection for Data-Dependent Shotgun Proteomics.针对数据依赖型鸟枪法蛋白质组学的靶向特征检测。

J Proteome Res. 2017 Aug 4;16(8):2964-2974. doi: 10.1021/acs.jproteome.7b00248. Epub 2017 Jul 19.

A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments.一种用于多个气相色谱-质谱实验中信号峰比对的动态规划方法。

BMC Bioinformatics. 2007 Oct 29;8:419. doi: 10.1186/1471-2105-8-419.

引用本文的文献

Managing of Unassigned Mass Spectrometric Data by Neural Network for Cancer Phenotypes Classification.通过神经网络管理未分配的质谱数据用于癌症表型分类

J Pers Med. 2021 Dec 3;11(12):1288. doi: 10.3390/jpm11121288.

Screening of important metabolites and KRAS genotypes in colon cancer using secondary ion mass spectrometry.使用二次离子质谱法筛选结肠癌中的重要代谢物和KRAS基因型。

Bioeng Transl Med. 2020 Nov 17;6(2):e10200. doi: 10.1002/btm2.10200. eCollection 2021 May.

Multiple Compounds Recognition from The Tandem Mass Spectral Data Using Convolutional Neural Network.基于卷积神经网络的串联质谱数据中多种化合物的识别。

Molecules. 2019 Dec 15;24(24):4590. doi: 10.3390/molecules24244590.

本文引用的文献

OpenMS - A platform for reproducible analysis of mass spectrometry data.OpenMS - 一个用于重现性分析质谱数据的平台。

J Biotechnol. 2017 Nov 10;261:142-148. doi: 10.1016/j.jbiotec.2017.05.016. Epub 2017 May 27.

Direct analysis of free amino acids by mixed-mode chromatography with tandem mass spectrometry.混合模式色谱与串联质谱法直接分析游离氨基酸。

J Sep Sci. 2017 Apr;40(7):1482-1492. doi: 10.1002/jssc.201601097. Epub 2017 Mar 8.

The MaxQuant computational platform for mass spectrometry-based shotgun proteomics.MaxQuant 计算平台用于基于质谱的鸟枪法蛋白质组学。

Nat Protoc. 2016 Dec;11(12):2301-2319. doi: 10.1038/nprot.2016.136. Epub 2016 Oct 27.

OpenMS: a flexible open-source software platform for mass spectrometry data analysis.OpenMS：一个灵活的开源质谱数据分析软件平台。

Nat Methods. 2016 Aug 30;13(9):741-8. doi: 10.1038/nmeth.3959.

Proteomics Quality Control: Quality Control Software for MaxQuant Results.蛋白质组学质量控制：用于MaxQuant结果的质量控制软件

J Proteome Res. 2016 Mar 4;15(3):777-87. doi: 10.1021/acs.jproteome.5b00780. Epub 2015 Dec 28.

Fast parametric time warping of peak lists.快速参数时 warp 峰列表。

Bioinformatics. 2015 Sep 15;31(18):3063-5. doi: 10.1093/bioinformatics/btv299. Epub 2015 May 13.

PeakLink: a new peptide peak linking method in LC-MS/MS using wavelet and SVM.PeakLink：一种基于小波和支持向量机的液相色谱-串联质谱中新的肽峰连接方法。

Bioinformatics. 2014 Sep 1;30(17):2464-70. doi: 10.1093/bioinformatics/btu299. Epub 2014 May 9.

LC-MS alignment in theory and practice: a comprehensive algorithmic review.液相色谱-质谱联用的理论与实践中的比对：全面的算法综述

Brief Bioinform. 2015 Jan;16(1):104-17. doi: 10.1093/bib/bbt080. Epub 2013 Nov 21.

Installation and use of LabKey Server for proteomics.用于蛋白质组学的LabKey服务器的安装与使用。

Curr Protoc Bioinformatics. 2011 Dec;Chapter 13:13.5.1-13.5.25. doi: 10.1002/0471250953.bi1305s36.

SCFIA: a statistical corresponding feature identification algorithm for LC/MS.SCFIA：一种用于 LC/MS 的统计对应特征识别算法。

BMC Bioinformatics. 2011 Nov 11;12:439. doi: 10.1186/1471-2105-12-439.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种基于支持向量机（SVM）学习模型的液相色谱-质谱联用（LC-MS）中同位素分布模式匹配算法。

A matching algorithm with isotope distribution pattern in LC-MS based on support vector machine (SVM) learning model.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献