Suppr超能文献

ClusterSheep:一种用于从 shotgun 蛋白质组学中大规模聚类串联质谱的图形处理单元加速软件工具。

ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics.

机构信息

Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong.

出版信息

J Proteome Res. 2021 Dec 3;20(12):5359-5367. doi: 10.1021/acs.jproteome.1c00485. Epub 2021 Nov 4.

Abstract

Modern shotgun proteomics experiments generate gigabytes of spectra every hour, only a fraction of which were utilized to form biological conclusions. Instead of being stored as flat files in public data repositories, this large amount of data can be better organized to facilitate data reuse. Clustering these spectra by similarity can be helpful in building high-quality spectral libraries, correcting identification errors, and highlighting frequently observed but unidentified spectra. However, large-scale clustering is time-consuming. Here, we present ClusterSheep, a method utilizing Graphics Processing Units (GPUs) to accelerate the process. Unlike previously proposed algorithms for this purpose, our method performs true pairwise comparison of all spectra within a precursor mass-to-charge ratio tolerance, thereby preserving the full cluster structures. ClusterSheep was benchmarked against previously reported clustering tools, MS-Cluster, MaRaCluster, and msCRUSH. The software tool also functions as an interactive visualization tool with a persistent state, enabling the user to explore the resulting clusters visually and retrieve the clustering results as desired.

摘要

现代 shotgun 蛋白质组学实验每小时生成数 Gb 的谱图,其中只有一小部分被用于形成生物学结论。与其作为平面文件存储在公共数据存储库中,不如更好地组织这些大量数据,以方便数据重用。通过相似性对这些谱图进行聚类有助于构建高质量的光谱库、纠正鉴定错误,并突出经常观察到但未识别的谱图。然而,大规模聚类是耗时的。在这里,我们提出了 ClusterSheep,一种利用图形处理单元 (GPU) 来加速该过程的方法。与为此目的提出的先前算法不同,我们的方法在母离子质量电荷比容限内对所有谱图执行真正的两两比较,从而保留完整的聚类结构。我们对 ClusterSheep 进行了基准测试,与之前报道的聚类工具 MS-Cluster、MaRaCluster 和 msCRUSH 进行了比较。该软件工具还具有交互可视化工具的功能,具有持久状态,使用户能够直观地探索生成的聚类,并根据需要检索聚类结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验