• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用分区选择偏差实现高质量的质谱聚类。

Leveraging the partition selection bias to achieve a high-quality clustering of mass spectra.

机构信息

Laboratory of Structural and Computational Proteomics, Carlos Chagas Institute, Fiocruz Paraná, Brazil.

Department of Chemical Biology, Leibniz - Forschungsinstitut für Molekulare Pharmakologie (FMP), Berlin, Germany.

出版信息

J Proteomics. 2021 Aug 15;245:104282. doi: 10.1016/j.jprot.2021.104282. Epub 2021 Jun 2.

DOI:10.1016/j.jprot.2021.104282
PMID:34089898
Abstract

In proteomics, the identification of peptides from mass spectral data can be mathematically described as the partitioning of mass spectra into clusters (i.e., groups of spectra derived from the same peptide). The way partitions are validated is just as important, having evolved side by side with the clustering algorithms themselves and given rise to many partition assessment measures. An assessment measure is said to have a selection bias if, and only if, the probability that a randomly chosen partition scoring a high value depends on the number of clusters in the partition. In the context of clustering mass spectra, this might mislead the validation process to favor clustering algorithms that generate too many (or few) spectral clusters, regardless of the underlying peptide sequence. A selection bias toward the number of peptides is desirable for proteomics as it estimates the number of peptides in a complex protein mixture. Here, we introduce an assessment measure that is purposely biased toward the number of peptide ion species. We also introduce a partition assessment framework for proteomics, called the Partition Assessment Tool, and demonstrate its importance by evaluating the performance of eight clustering algorithms on seven proteomics datasets while discussing the trade-offs involved. SIGNIFICANCE: Clustering algorithms are widely adopted in proteomics for undertaking several tasks such as speeding up search engines, generating consensus mass spectra, and to aid in the classification of proteomic profiles. Choosing which algorithm is most fit for the task at hand is not simple as each algorithm has advantages and disadvantages; furthermore, specifying clustering parameters is also a necessary and fundamental step. For example, deciding on whether to generate "pure clusters" or fewer clusters but accepting noise. With this as motivation, we verify the performance of several widely adopted algorithms on proteomic datasets and introduce a theoretical framework for drawing conclusions on which approach is suitable for the task at hand.

摘要

在蛋白质组学中,从质谱数据中鉴定肽可以在数学上描述为将质谱分成簇(即,源自同一肽的光谱组)。验证分区的方式同样重要,它与聚类算法本身一起发展,并产生了许多分区评估措施。如果并且仅当随机选择的分区得分高的概率取决于分区中的簇数,则评估措施被认为存在选择偏差。在聚类质谱的上下文中,这可能会误导验证过程,偏向于生成过多(或过少)光谱簇的聚类算法,而不管潜在的肽序列如何。选择偏向肽的数量对于蛋白质组学是可取的,因为它估计了复杂蛋白质混合物中的肽数量。在这里,我们引入了一种评估措施,该措施有意偏向肽离子种类的数量。我们还介绍了一种用于蛋白质组学的分区评估框架,称为分区评估工具,并通过在讨论所涉及的权衡时评估八个聚类算法在七个蛋白质组学数据集上的性能来证明其重要性。意义:聚类算法在蛋白质组学中被广泛采用,用于执行多项任务,例如加快搜索引擎,生成共识质谱,并帮助对蛋白质组学图谱进行分类。选择最适合手头任务的算法并不简单,因为每种算法都有优点和缺点;此外,指定聚类参数也是必要的和基本的步骤。例如,决定是生成“纯簇”还是生成较少的簇但接受噪声。出于这个动机,我们在蛋白质组学数据集上验证了几种广泛采用的算法的性能,并引入了一个理论框架,以便就哪种方法适合手头的任务得出结论。

相似文献

1
Leveraging the partition selection bias to achieve a high-quality clustering of mass spectra.利用分区选择偏差实现高质量的质谱聚类。
J Proteomics. 2021 Aug 15;245:104282. doi: 10.1016/j.jprot.2021.104282. Epub 2021 Jun 2.
2
msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing.msCRUSH:基于局部敏感哈希的快速串联质谱聚类。
J Proteome Res. 2019 Jan 4;18(1):147-158. doi: 10.1021/acs.jproteome.8b00448. Epub 2018 Dec 14.
3
In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。
J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.
4
Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。
J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.
5
The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results.APEX定量蛋白质组学工具:从液相色谱-串联质谱蛋白质组学结果生成蛋白质定量估计值。
BMC Bioinformatics. 2008 Dec 9;9:529. doi: 10.1186/1471-2105-9-529.
6
Comparative database search engine analysis on massive tandem mass spectra of pork-based food products for halal proteomics.基于猪肉的食品清真蛋白质组学大规模串联质谱的比较数据库搜索引擎分析
J Proteomics. 2021 Jun 15;241:104240. doi: 10.1016/j.jprot.2021.104240. Epub 2021 Apr 21.
7
Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra.串联质谱数据聚类算法的比较与评估。
J Proteome Res. 2017 Nov 3;16(11):4035-4044. doi: 10.1021/acs.jproteome.7b00427.
8
ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics.ClusterSheep:一种用于从 shotgun 蛋白质组学中大规模聚类串联质谱的图形处理单元加速软件工具。
J Proteome Res. 2021 Dec 3;20(12):5359-5367. doi: 10.1021/acs.jproteome.1c00485. Epub 2021 Nov 4.
9
Implementation and application of a versatile clustering tool for tandem mass spectrometry data.一种用于串联质谱数据的通用聚类工具的实现与应用
Proteomics. 2007 Sep;7(18):3245-58. doi: 10.1002/pmic.200700160.
10
A novel approach for clustering proteomics data using Bayesian fast Fourier transform.一种使用贝叶斯快速傅里叶变换对蛋白质组学数据进行聚类的新方法。
Bioinformatics. 2005 May 15;21(10):2210-24. doi: 10.1093/bioinformatics/bti383. Epub 2005 Mar 15.