Suppr超能文献

同位素标记实验中不平衡质谱的质量控制。

Quality control of imbalanced mass spectra from isotopic labeling experiments.

机构信息

Department of Computer and Information Science, University of Macau, Taipa, Macau, China.

College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian, China.

出版信息

BMC Bioinformatics. 2019 Nov 6;20(1):549. doi: 10.1186/s12859-019-3170-1.

Abstract

BACKGROUND

Mass spectra are usually acquired from the Liquid Chromatography-Mass Spectrometry (LC-MS) analysis for isotope labeled proteomics experiments. In such experiments, the mass profiles of labeled (heavy) and unlabeled (light) peptide pairs are represented by isotope clusters (2D or 3D) that provide valuable information about the studied biological samples in different conditions. The core task of quality control in quantitative LC-MS experiment is to filter out low-quality peptides with questionable profiles. The commonly used methods for this problem are the classification approaches. However, the data imbalance problems in previous control methods are often ignored or mishandled. In this study, we introduced a quality control framework based on the extreme gradient boosting machine (XGBoost), and carefully addressed the imbalanced data problem in this framework.

RESULTS

In the XGBoost based framework, we suggest the application of the Synthetic minority over-sampling technique (SMOTE) to re-balance data and use the balanced data to train the boosted trees as the classifier. Then the classifier is applied to other data for the peptide quality assessment. Experimental results show that our proposed framework increases the reliability of peptide heavy-light ratio estimation significantly.

CONCLUSIONS

Our results indicate that this framework is a powerful method for the peptide quality assessment. For the feature extraction part, the extracted ion chromatogram (XIC) based features contribute to the peptide quality assessment. To solve the imbalanced data problem, SMOTE brings a much better classification performance. Finally, the XGBoost is capable for the peptide quality control. Overall, our proposed framework provides reliable results for the further proteomics studies.

摘要

背景

质谱通常是从液相色谱-质谱(LC-MS)分析中获取的,用于同位素标记蛋白质组学实验。在这样的实验中,标记(重)和未标记(轻)肽对的质谱谱图由同位素簇(2D 或 3D)表示,这些同位素簇提供了关于不同条件下研究生物样本的有价值的信息。定量 LC-MS 实验的质量控制的核心任务是过滤掉具有可疑谱图的低质量肽。用于解决此问题的常用方法是分类方法。然而,先前控制方法中的数据不平衡问题通常被忽略或处理不当。在这项研究中,我们引入了一种基于极端梯度提升机(XGBoost)的质量控制框架,并在该框架中仔细解决了数据不平衡问题。

结果

在基于 XGBoost 的框架中,我们建议应用合成少数过采样技术(SMOTE)来重新平衡数据,并使用平衡数据训练提升树作为分类器。然后将分类器应用于其他数据以进行肽质量评估。实验结果表明,我们提出的框架显著提高了肽重轻比估计的可靠性。

结论

我们的结果表明,该框架是一种用于肽质量评估的强大方法。对于特征提取部分,基于提取离子色谱图(XIC)的特征有助于肽质量评估。为了解决数据不平衡问题,SMOTE 带来了更好的分类性能。最后,XGBoost 能够进行肽质量控制。总体而言,我们提出的框架为进一步的蛋白质组学研究提供了可靠的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c2e/6833298/0d2762157e54/12859_2019_3170_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验