同位素标记实验中不平衡质谱的质量控制。

Quality control of imbalanced mass spectra from isotopic labeling experiments.

机构信息

Department of Computer and Information Science, University of Macau, Taipa, Macau, China.

College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian, China.

出版信息

BMC Bioinformatics. 2019 Nov 6;20(1):549. doi: 10.1186/s12859-019-3170-1.

DOI:10.1186/s12859-019-3170-1

PMID:31694522

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6833298/

Abstract

BACKGROUND

Mass spectra are usually acquired from the Liquid Chromatography-Mass Spectrometry (LC-MS) analysis for isotope labeled proteomics experiments. In such experiments, the mass profiles of labeled (heavy) and unlabeled (light) peptide pairs are represented by isotope clusters (2D or 3D) that provide valuable information about the studied biological samples in different conditions. The core task of quality control in quantitative LC-MS experiment is to filter out low-quality peptides with questionable profiles. The commonly used methods for this problem are the classification approaches. However, the data imbalance problems in previous control methods are often ignored or mishandled. In this study, we introduced a quality control framework based on the extreme gradient boosting machine (XGBoost), and carefully addressed the imbalanced data problem in this framework.

RESULTS

In the XGBoost based framework, we suggest the application of the Synthetic minority over-sampling technique (SMOTE) to re-balance data and use the balanced data to train the boosted trees as the classifier. Then the classifier is applied to other data for the peptide quality assessment. Experimental results show that our proposed framework increases the reliability of peptide heavy-light ratio estimation significantly.

CONCLUSIONS

Our results indicate that this framework is a powerful method for the peptide quality assessment. For the feature extraction part, the extracted ion chromatogram (XIC) based features contribute to the peptide quality assessment. To solve the imbalanced data problem, SMOTE brings a much better classification performance. Finally, the XGBoost is capable for the peptide quality control. Overall, our proposed framework provides reliable results for the further proteomics studies.

摘要

背景

质谱通常是从液相色谱-质谱（LC-MS）分析中获取的，用于同位素标记蛋白质组学实验。在这样的实验中，标记（重）和未标记（轻）肽对的质谱谱图由同位素簇（2D 或 3D）表示，这些同位素簇提供了关于不同条件下研究生物样本的有价值的信息。定量 LC-MS 实验的质量控制的核心任务是过滤掉具有可疑谱图的低质量肽。用于解决此问题的常用方法是分类方法。然而，先前控制方法中的数据不平衡问题通常被忽略或处理不当。在这项研究中，我们引入了一种基于极端梯度提升机（XGBoost）的质量控制框架，并在该框架中仔细解决了数据不平衡问题。

结果

在基于 XGBoost 的框架中，我们建议应用合成少数过采样技术（SMOTE）来重新平衡数据，并使用平衡数据训练提升树作为分类器。然后将分类器应用于其他数据以进行肽质量评估。实验结果表明，我们提出的框架显著提高了肽重轻比估计的可靠性。

结论

我们的结果表明，该框架是一种用于肽质量评估的强大方法。对于特征提取部分，基于提取离子色谱图（XIC）的特征有助于肽质量评估。为了解决数据不平衡问题，SMOTE 带来了更好的分类性能。最后，XGBoost 能够进行肽质量控制。总体而言，我们提出的框架为进一步的蛋白质组学研究提供了可靠的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c2e/6833298/0d2762157e54/12859_2019_3170_Fig1_HTML.jpg

相似文献

Quality control of imbalanced mass spectra from isotopic labeling experiments.同位素标记实验中不平衡质谱的质量控制。

BMC Bioinformatics. 2019 Nov 6;20(1):549. doi: 10.1186/s12859-019-3170-1.

An automated method for the analysis of stable isotope labeling data in proteomics.一种蛋白质组学中稳定同位素标记数据的自动化分析方法。

J Am Soc Mass Spectrom. 2005 Jul;16(7):1181-91. doi: 10.1016/j.jasms.2005.03.016.

Rapidly Assessing the Quality of Targeted Proteomics Experiments through Monitoring Stable-Isotope Labeled Standards.通过监测稳定同位素标记标准品快速评估靶向蛋白质组学实验的质量。

J Proteome Res. 2019 Feb 1;18(2):694-699. doi: 10.1021/acs.jproteome.8b00688. Epub 2018 Dec 19.

SVM model for quality assessment of medium resolution mass spectra from 18O-water labeling experiments.基于 18O 水标记实验的中分辨质谱质量评估的 SVM 模型。

J Proteome Res. 2011 Apr 1;10(4):2095-103. doi: 10.1021/pr1012174. Epub 2011 Feb 23.

High-speed data reduction, feature detection, and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry.使用高分辨率质谱对鸟枪法蛋白质组学数据集进行高速数据缩减、特征检测和串联质谱（MS/MS）谱图质量评估。

Anal Chem. 2007 Aug 1;79(15):5620-32. doi: 10.1021/ac0700833. Epub 2007 Jun 21.

Matching isotopic distributions from metabolically labeled samples.匹配来自代谢标记样本的同位素分布。

Bioinformatics. 2008 Jul 1;24(13):i339-47. doi: 10.1093/bioinformatics/btn190.

Improved quantitative analysis of mass spectrometry using quadratic equations.利用二次方程提高质谱定量分析的能力。

J Proteome Res. 2010 May 7;9(5):2775-85. doi: 10.1021/pr100183t.

Standardization approaches in absolute quantitative proteomics with mass spectrometry.基于质谱的绝对定量蛋白质组学的标准化方法。

Mass Spectrom Rev. 2018 Nov;37(6):715-737. doi: 10.1002/mas.21542. Epub 2017 Jul 31.

EBprot: Statistical analysis of labeling-based quantitative proteomics data.EBprot：基于标记的定量蛋白质组学数据的统计分析

Proteomics. 2015 Aug;15(15):2580-91. doi: 10.1002/pmic.201400620. Epub 2015 May 28.

Production and use of stable isotope-labeled proteins for absolute quantitative proteomics.用于绝对定量蛋白质组学的稳定同位素标记蛋白质的生产与应用

Methods Mol Biol. 2011;753:93-115. doi: 10.1007/978-1-61779-148-2_7.

引用本文的文献

A Radiomics Model for Predicting Early Recurrence in Grade II Gliomas Based on Preoperative Multiparametric Magnetic Resonance Imaging.基于术前多参数磁共振成像预测II级胶质瘤早期复发的放射组学模型

Front Oncol. 2021 Sep 2;11:684996. doi: 10.3389/fonc.2021.684996. eCollection 2021.

本文引用的文献

Artificial Intelligence Understands Peptide Observability and Assists With Absolute Protein Quantification.人工智能理解肽的可观测性并助力绝对蛋白质定量。

Front Plant Sci. 2018 Nov 13;9:1559. doi: 10.3389/fpls.2018.01559. eCollection 2018.

DeepPep: Deep proteome inference from peptide profiles.DeepPep：基于肽谱的深度蛋白质组推断。

PLoS Comput Biol. 2017 Sep 5;13(9):e1005661. doi: 10.1371/journal.pcbi.1005661. eCollection 2017 Sep.

Quality control in mass spectrometry-based proteomics.基于质谱的蛋白质组学中的质量控制。

Mass Spectrom Rev. 2018 Sep;37(5):697-711. doi: 10.1002/mas.21544. Epub 2017 Sep 7.

An Approach for Peptide Identification by De Novo Sequencing of Mixture Spectra.一种通过混合光谱的从头测序进行肽段鉴定的方法。

IEEE/ACM Trans Comput Biol Bioinform. 2017 Mar-Apr;14(2):326-336. doi: 10.1109/TCBB.2015.2407401.

CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.基于随机森林的用于特征选择和参数优化的CURE-SMOTE算法及混合算法。

BMC Bioinformatics. 2017 Mar 14;18(1):169. doi: 10.1186/s12859-017-1578-z.

SILVER: an efficient tool for stable isotope labeling LC-MS data quantitative analysis with quality control methods.SILVER：一种高效的工具，用于带有质量控制方法的稳定同位素标记 LC-MS 数据定量分析。

Bioinformatics. 2014 Feb 15;30(4):586-7. doi: 10.1093/bioinformatics/btt726. Epub 2013 Dec 15.

Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology.机器学习在蛋白质组学数据中的应用：后基因组生物学中的分类和生物标志物识别。

OMICS. 2013 Dec;17(12):595-610. doi: 10.1089/omi.2013.0017. Epub 2013 Oct 12.

Accurate LC peak boundary detection for ¹⁶O/¹⁸O labeled LC-MS data.准确检测¹⁶O/¹⁸O 标记 LC-MS 数据的 LC 峰边界。

PLoS One. 2013 Oct 7;8(10):e72951. doi: 10.1371/journal.pone.0072951. eCollection 2013.

Improving X!Tandem on peptide identification from mass spectrometry by self-boosted Percolator.通过自增强 percolator 提高 X！串联在质谱肽鉴定中的性能。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1273-80. doi: 10.1109/TCBB.2012.86.

Multiclass Imbalance Problems: Analysis and Potential Solutions.多类不平衡问题：分析与潜在解决方案

IEEE Trans Syst Man Cybern B Cybern. 2012 Aug;42(4):1119-30. doi: 10.1109/TSMCB.2012.2187280. Epub 2012 Mar 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

同位素标记实验中不平衡质谱的质量控制。

Quality control of imbalanced mass spectra from isotopic labeling experiments.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献