• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Large-scale tandem mass spectrum clustering using fast nearest neighbor searching.使用快速最近邻搜索的大规模串联质谱聚类
Rapid Commun Mass Spectrom. 2025 May;39 Suppl 1(Suppl 1):e9153. doi: 10.1002/rcm.9153. Epub 2021 Jul 20.
2
Extremely Fast and Accurate Open Modification Spectral Library Searching of High-Resolution Mass Spectra Using Feature Hashing and Graphics Processing Units.使用特征哈希和图形处理单元进行高分辨率质谱的极快速准确开放修饰谱库搜索。
J Proteome Res. 2019 Oct 4;18(10):3792-3799. doi: 10.1021/acs.jproteome.9b00291. Epub 2019 Aug 30.
3
Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing.快速开放修改谱库搜索通过近似最近邻索引。
J Proteome Res. 2018 Oct 5;17(10):3463-3474. doi: 10.1021/acs.jproteome.8b00359. Epub 2018 Sep 13.
4
msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing.msCRUSH:基于局部敏感哈希的快速串联质谱聚类。
J Proteome Res. 2019 Jan 4;18(1):147-158. doi: 10.1021/acs.jproteome.8b00448. Epub 2018 Dec 14.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。
J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.
7
Semisupervised Machine Learning for Sensitive Open Modification Spectral Library Searching.基于半监督学习的敏感开放修饰谱库检索方法。
J Proteome Res. 2023 Feb 3;22(2):585-593. doi: 10.1021/acs.jproteome.2c00616. Epub 2023 Jan 23.
8
A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing.基于局部敏感哈希的快速、低内存消耗光谱库搜索算法
Proteomics. 2020 Nov;20(21-22):e2000002. doi: 10.1002/pmic.202000002. Epub 2020 Jun 29.
9
Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing.通过几何嵌入和局部敏感哈希加速大型化合物集的相似性搜索和聚类。
Bioinformatics. 2010 Apr 1;26(7):953-9. doi: 10.1093/bioinformatics/btq067. Epub 2010 Feb 23.
10
ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics.ClusterSheep:一种用于从 shotgun 蛋白质组学中大规模聚类串联质谱的图形处理单元加速软件工具。
J Proteome Res. 2021 Dec 3;20(12):5359-5367. doi: 10.1021/acs.jproteome.1c00485. Epub 2021 Nov 4.

引用本文的文献

1
TopLib: Building and Searching Top-Down Mass Spectral Libraries for Proteoform Identification.TopLib:构建和搜索自上而下的质谱库以进行蛋白质异构体鉴定。
Anal Chem. 2025 Jun 10;97(22):11443-11453. doi: 10.1021/acs.analchem.4c06627. Epub 2025 May 29.
2
A universal language for finding mass spectrometry data patterns.一种用于查找质谱数据模式的通用语言。
Nat Methods. 2025 May 12. doi: 10.1038/s41592-025-02660-z.
3
MS-RT: A Method for Evaluating MS/MS Clustering Performance for Metabolomics Data.MS-RT:一种评估代谢组学数据MS/MS聚类性能的方法。
J Proteome Res. 2025 Apr 4;24(4):1778-1790. doi: 10.1021/acs.jproteome.4c00881. Epub 2025 Mar 5.
4
The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: mzSpecLib and mzPAF.蛋白质组学标准倡议标准化格式的光谱库和碎片离子峰注释:mzSpecLib 和 mzPAF。
Anal Chem. 2024 Nov 19;96(46):18491-18501. doi: 10.1021/acs.analchem.4c04091. Epub 2024 Nov 8.
5
Empirically establishing drug exposure records directly from untargeted metabolomics data.直接从非靶向代谢组学数据中凭经验建立药物暴露记录。
bioRxiv. 2024 Oct 26:2024.10.07.617109. doi: 10.1101/2024.10.07.617109.
6
Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain.探索免疫肽组学的动态格局:揭示翻译后修饰并跨越生物信息学领域。
Mass Spectrom Rev. 2025 Jul-Aug;44(4):599-629. doi: 10.1002/mas.21905. Epub 2024 Aug 16.
7
Spectroscape enables real-time query and visualization of a spectral archive in proteomics.Spectroscape 能够实时查询和可视化蛋白质组学中的光谱档案。
Nat Commun. 2023 Oct 7;14(1):6267. doi: 10.1038/s41467-023-42006-x.
8
HyperSpec: Ultrafast Mass Spectra Clustering in Hyperdimensional Space.超高维空间中的超快质谱聚类分析
J Proteome Res. 2023 Jun 2;22(6):1639-1648. doi: 10.1021/acs.jproteome.2c00612. Epub 2023 May 11.
9
Good practices and recommendations for using and benchmarking computational metabolomics metabolite annotation tools.使用和基准测试计算代谢组学生物标志物注释工具的良好实践和建议。
Metabolomics. 2022 Dec 5;18(12):103. doi: 10.1007/s11306-022-01963-y.
10
Artificial intelligence in microbial natural product drug discovery: current and emerging role.人工智能在微生物天然产物药物发现中的作用:现状与展望。
Nat Prod Rep. 2022 Dec 14;39(12):2215-2230. doi: 10.1039/d2np00035k.

本文引用的文献

1
A learned embedding for efficient joint analysis of millions of mass spectra.一种用于高效联合分析数百万个质谱的深度学习嵌入方法。
Nat Methods. 2022 Jun;19(6):675-678. doi: 10.1038/s41592-022-01496-1. Epub 2022 May 30.
2
Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships.Spec2Vec:通过学习结构关系提高质谱相似性评分。
PLoS Comput Biol. 2021 Feb 16;17(2):e1008724. doi: 10.1371/journal.pcbi.1008724. eCollection 2021 Feb.
3
Array programming with NumPy.使用 NumPy 进行数组编程。
Nature. 2020 Sep;585(7825):357-362. doi: 10.1038/s41586-020-2649-2. Epub 2020 Sep 16.
4
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.
5
spectrum_utils: A Python Package for Mass Spectrometry Data Processing and Visualization.spectrum_utils:一个用于质谱数据分析和可视化的 Python 包。
Anal Chem. 2020 Jan 7;92(1):659-661. doi: 10.1021/acs.analchem.9b04884. Epub 2019 Dec 20.
6
ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion.ThermoRawFileParser:模块化、可扩展且跨平台的 RAW 文件转换。
J Proteome Res. 2020 Jan 3;19(1):537-542. doi: 10.1021/acs.jproteome.9b00328. Epub 2019 Dec 6.
7
Extremely Fast and Accurate Open Modification Spectral Library Searching of High-Resolution Mass Spectra Using Feature Hashing and Graphics Processing Units.使用特征哈希和图形处理单元进行高分辨率质谱的极快速准确开放修饰谱库搜索。
J Proteome Res. 2019 Oct 4;18(10):3792-3799. doi: 10.1021/acs.jproteome.9b00291. Epub 2019 Aug 30.
8
Pyteomics 4.0: Five Years of Development of a Python Proteomics Framework.Pyteomics 4.0:五年 Python 蛋白质组学框架的发展。
J Proteome Res. 2019 Feb 1;18(2):709-714. doi: 10.1021/acs.jproteome.8b00717. Epub 2019 Jan 8.
9
msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing.msCRUSH:基于局部敏感哈希的快速串联质谱聚类。
J Proteome Res. 2019 Jan 4;18(1):147-158. doi: 10.1021/acs.jproteome.8b00448. Epub 2018 Dec 14.
10
The PRIDE database and related tools and resources in 2019: improving support for quantification data.PRIDE 数据库及相关工具和资源在 2019 年的进展:提高定量数据支持。
Nucleic Acids Res. 2019 Jan 8;47(D1):D442-D450. doi: 10.1093/nar/gky1106.

使用快速最近邻搜索的大规模串联质谱聚类

Large-scale tandem mass spectrum clustering using fast nearest neighbor searching.

作者信息

Bittremieux Wout, Laukens Kris, Noble William Stafford, Dorrestein Pieter C

机构信息

Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States.

Department of Computer Science, University of Antwerp, Antwerp, Belgium.

出版信息

Rapid Commun Mass Spectrom. 2025 May;39 Suppl 1(Suppl 1):e9153. doi: 10.1002/rcm.9153. Epub 2021 Jul 20.

DOI:10.1002/rcm.9153
PMID:34169593
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8709870/
Abstract

RATIONALE

Advanced algorithmic solutions are necessary to process the ever-increasing amounts of mass spectrometry data that are being generated. In this study, we describe the falcon spectrum clustering tool for efficient clustering of millions of MS/MS spectra.

METHODS

falcon succeeds in efficiently clustering large amounts of mass spectral data using advanced techniques for fast spectrum similarity searching. First, high-resolution spectra are binned and converted to low-dimensional vectors using feature hashing. Next, the spectrum vectors are used to construct nearest neighbor indexes for fast similarity searching. The nearest neighbor indexes are used to efficiently compute a sparse pairwise distance matrix without having to exhaustively perform all pairwise spectrum comparisons within the relevant precursor mass tolerance. Finally, density-based clustering is performed to group similar spectra into clusters.

RESULTS

Several state-of-the-art spectrum clustering tools were evaluated using a large draft human proteome data set consisting of 25 million spectra, indicating that alternative tools produce clustering results with different characteristics. Notably, falcon generates larger highly pure clusters than alternative tools, leading to a larger reduction in data volume without the loss of relevant information for more efficient downstream processing.

CONCLUSIONS

falcon is a highly efficient spectrum clustering tool, which is publicly available as an open source under the permissive BSD license at https://github.com/bittremieux/falcon.

摘要

原理

需要先进的算法解决方案来处理不断增加的质谱数据量。在本研究中,我们描述了用于对数百万个MS/MS谱进行高效聚类的falcon谱聚类工具。

方法

falcon使用先进的快速谱相似性搜索技术成功地对数大量质谱数据进行高效聚类。首先,对高分辨率谱进行分箱,并使用特征哈希将其转换为低维向量。接下来,使用谱向量构建最近邻索引以进行快速相似性搜索。最近邻索引用于高效计算稀疏成对距离矩阵,而无需在相关前体质量容差内详尽地执行所有成对谱比较。最后,进行基于密度的聚类以将相似谱分组为簇。

结果

使用由2500万个谱组成的大型人类蛋白质组草图数据集对几种最先进的谱聚类工具进行了评估,表明其他工具产生具有不同特征的聚类结果。值得注意的是,falcon生成的高纯度簇比其他工具更大,从而在不损失相关信息的情况下更大程度地减少数据量,以便进行更高效的下游处理。

结论

falcon是一种高效的谱聚类工具,可在https://github.com/bittremieux/falcon上根据宽松的BSD许可作为开源软件公开获取。