• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于评估串联质谱质量的无监督机器学习方法。

An unsupervised machine learning method for assessing quality of tandem mass spectra.

机构信息

Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Dr,, Saskatoon, S7N 5A9, Canada.

出版信息

Proteome Sci. 2012 Jun 21;10 Suppl 1(Suppl 1):S12. doi: 10.1186/1477-5956-10-S1-S12.

DOI:10.1186/1477-5956-10-S1-S12
PMID:22759570
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3380733/
Abstract

BACKGROUND

In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets.

RESULTS

This study proposes an unsupervised machine learning method for quality assessment of tandem mass spectra without any training dataset. This proposed method estimates the conditional probabilities of spectra being high quality from the quality assessments based on individual features. The probabilities are estimated through a constraint optimization problem. An efficient algorithm is developed to solve the constraint optimization problem and is proved to be convergent. Experimental results on two datasets illustrate that if we search only tandem spectra with the high quality determined by the proposed method, we can save about 56 % and 62% of database searching time while losing only a small amount of high-quality spectra.

CONCLUSIONS

Results indicate that the proposed method has a good performance for the quality assessment of tandem mass spectra and the way we estimate the conditional probabilities is effective.

摘要

背景

在单个蛋白质组学项目中,串联质谱仪可以产生数亿个串联质谱。然而,大多数串联质谱的质量都很差,在数据库中搜索肽时会浪费大量时间。因此,质量评估(在数据库搜索之前)在通过串联质谱进行蛋白质鉴定的流水线中非常有用,特别是在减少搜索时间和减少错误识别方面。大多数现有的质量评估方法都是基于描述串联质谱质量的许多特征的有监督机器学习方法。这些方法需要具有所有光谱质量信息的训练数据集,但对于新数据集通常无法获得。

结果

本研究提出了一种无需任何训练数据集的用于串联质谱质量评估的无监督机器学习方法。该方法通过基于单个特征的质量评估来估计高质量光谱的条件概率。通过约束优化问题来估计概率。开发了一种有效的算法来解决约束优化问题,并证明其是收敛的。在两个数据集上的实验结果表明,如果我们仅搜索由所提出的方法确定的高质量的串联光谱,则可以在仅丢失少量高质量光谱的情况下,节省约 56%和 62%的数据库搜索时间。

结论

结果表明,所提出的方法在串联质谱质量评估方面具有良好的性能,并且我们估计条件概率的方法是有效的。

相似文献

1
An unsupervised machine learning method for assessing quality of tandem mass spectra.一种用于评估串联质谱质量的无监督机器学习方法。
Proteome Sci. 2012 Jun 21;10 Suppl 1(Suppl 1):S12. doi: 10.1186/1477-5956-10-S1-S12.
2
Quality assessment of peptide tandem mass spectra.肽串联质谱的质量评估
BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S13. doi: 10.1186/1471-2105-9-S6-S13.
3
Model based clustering for tandem mass spectrum quality assessment.基于模型的串联质谱质量评估聚类分析
Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:6747-50. doi: 10.1109/IEMBS.2009.5332499.
4
Quality assessment of tandem mass spectra using support vector machine (SVM).使用支持向量机(SVM)对串联质谱进行质量评估。
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S49. doi: 10.1186/1471-2105-10-S1-S49.
5
Charge state determination of peptide tandem mass spectra using support vector machine (SVM).使用支持向量机(SVM)进行肽串联质谱的电荷态测定。
IEEE Trans Inf Technol Biomed. 2010 May;14(3):552-8. doi: 10.1109/TITB.2010.2040287. Epub 2010 Jan 29.
6
A novel approach to denoising ion trap tandem mass spectra.一种用于离子阱串联质谱去噪的新方法。
Proteome Sci. 2009 Mar 17;7:9. doi: 10.1186/1477-5956-7-9.
7
Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra.通过串联质谱搜索蛋白质序列数据库鉴定肽段的手动评估综合方法。
J Proteome Res. 2005 May-Jun;4(3):998-1005. doi: 10.1021/pr049754t.
8
Colander: a probability-based support vector machine algorithm for automatic screening for CID spectra of phosphopeptides prior to database search.滤器:一种基于概率的支持向量机算法,用于在数据库搜索之前自动筛选磷酸化肽段的CID光谱。
J Proteome Res. 2008 Aug;7(8):3628-34. doi: 10.1021/pr8001194. Epub 2008 Jun 19.
9
Statistical models for protein validation using tandem mass spectral data and protein amino acid sequence databases.使用串联质谱数据和蛋白质氨基酸序列数据库进行蛋白质验证的统计模型。
Anal Chem. 2004 Mar 15;76(6):1664-71. doi: 10.1021/ac035112y.
10
Peptide charge state determination of tandem mass spectra from low-resolution collision induced dissociation.串联质谱中来自低分辨碰撞诱导解离的肽电荷态测定。
Proteome Sci. 2011 Oct 14;9 Suppl 1(Suppl 1):S3. doi: 10.1186/1477-5956-9-S1-S3.

引用本文的文献

1
Systematic Comparison of CRISPR and shRNA Screens to Identify Essential Genes Using a Graph-Based Unsupervised Learning Model.基于图的无监督学习模型的 CRISPR 和 shRNA 筛选技术对必需基因的系统比较
Cells. 2024 Oct 4;13(19):1653. doi: 10.3390/cells13191653.
2
Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey.蛋白质基因组学数据分析方法、挑战及可扩展性瓶颈:一项综述。
IEEE Access. 2021;9:5497-5516. doi: 10.1109/ACCESS.2020.3047588. Epub 2020 Dec 25.
3
Soil and leaf litter metaproteomics-a brief guideline from sampling to understanding.

本文引用的文献

1
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.一种将肽的串联质谱数据与蛋白质数据库中氨基酸序列相关联的方法。
J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.
2
Peptide charge state determination of tandem mass spectra from low-resolution collision induced dissociation.串联质谱中来自低分辨碰撞诱导解离的肽电荷态测定。
Proteome Sci. 2011 Oct 14;9 Suppl 1(Suppl 1):S3. doi: 10.1186/1477-5956-9-S1-S3.
3
SVM-RFE based feature selection for tandem mass spectrum quality assessment.
土壤和落叶层宏蛋白质组学——从采样到理解的简要指南
FEMS Microbiol Ecol. 2016 Nov;92(11). doi: 10.1093/femsec/fiw180. Epub 2016 Aug 21.
基于支持向量机递归特征消除法的串联质谱质量评估特征选择
Int J Data Min Bioinform. 2011;5(1):73-88. doi: 10.1504/ijdmb.2011.038578.
4
Charge state determination of peptide tandem mass spectra using support vector machine (SVM).使用支持向量机(SVM)进行肽串联质谱的电荷态测定。
IEEE Trans Inf Technol Biomed. 2010 May;14(3):552-8. doi: 10.1109/TITB.2010.2040287. Epub 2010 Jan 29.
5
A novel approach to denoising ion trap tandem mass spectra.一种用于离子阱串联质谱去噪的新方法。
Proteome Sci. 2009 Mar 17;7:9. doi: 10.1186/1477-5956-7-9.
6
Filtering strategies for improving protein identification in high-throughput MS/MS studies.用于在高通量串联质谱研究中提高蛋白质鉴定的过滤策略。
Proteomics. 2009 Feb;9(4):848-60. doi: 10.1002/pmic.200800517.
7
Robust prediction of the MASCOT score for an improved quality assessment in mass spectrometric proteomics.用于质谱蛋白质组学中改进质量评估的MASCOT评分的稳健预测。
J Proteome Res. 2008 Sep;7(9):3708-17. doi: 10.1021/pr700859x. Epub 2008 Aug 16.
8
Quality assessment of peptide tandem mass spectra.肽串联质谱的质量评估
BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S13. doi: 10.1186/1471-2105-9-S6-S13.
9
Morphological grayscale reconstruction in image analysis: applications and efficient algorithms.图像分析中的形态学灰度重建:应用与高效算法。
IEEE Trans Image Process. 1993;2(2):176-201. doi: 10.1109/83.217222.
10
Clustering millions of tandem mass spectra.对数百万个串联质谱进行聚类。
J Proteome Res. 2008 Jan;7(1):113-22. doi: 10.1021/pr070361e. Epub 2007 Dec 8.