• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DIA-BERT:用于增强DIA蛋白质组学数据分析的预训练端到端Transformer模型。

DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis.

作者信息

Liu Zhiwei, Liu Pu, Sun Yingying, Nie Zongxiang, Zhang Xiaofan, Zhang Yuqi, Chen Yi, Guo Tiannan

机构信息

Affiliated Hangzhou First People's Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake University, Hangzhou, Zhejiang Province, China.

Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China.

出版信息

Nat Commun. 2025 Apr 14;16(1):3530. doi: 10.1038/s41467-025-58866-4.

DOI:10.1038/s41467-025-58866-4
PMID:40229248
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11997033/
Abstract

Data-independent acquisition mass spectrometry (DIA-MS) has become increasingly pivotal in quantitative proteomics. In this study, we present DIA-BERT, a software tool that harnesses a transformer-based pre-trained artificial intelligence (AI) model for analyzing DIA proteomics data. The identification model was trained using over 276 million high-quality peptide precursors extracted from existing DIA-MS files, while the quantification model was trained on 34 million peptide precursors from synthetic DIA-MS files. When compared to DIA-NN, DIA-BERT demonstrated a 51% increase in protein identifications and 22% more peptide precursors on average across five human cancer sample sets (cervical cancer, pancreatic adenocarcinoma, myosarcoma, gallbladder cancer, and gastric carcinoma), achieving high quantitative accuracy. This study underscores the potential of leveraging pre-trained models and synthetic datasets to enhance the analysis of DIA proteomics.

摘要

数据非依赖型采集质谱法(DIA-MS)在定量蛋白质组学中变得越来越关键。在本研究中,我们展示了DIA-BERT,这是一种软件工具,它利用基于Transformer的预训练人工智能(AI)模型来分析DIA蛋白质组学数据。识别模型使用从现有DIA-MS文件中提取的超过2.76亿个高质量肽前体进行训练,而定量模型则使用来自合成DIA-MS文件的3400万个肽前体进行训练。与DIA-NN相比,在五个人类癌症样本集(宫颈癌、胰腺腺癌、肉瘤、胆囊癌和胃癌)中,DIA-BERT的蛋白质识别率平均提高了51%,肽前体数量平均增加了22%,实现了高定量准确性。这项研究强调了利用预训练模型和合成数据集来加强DIA蛋白质组学分析的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e726/11997033/b0f8efdf2ba6/41467_2025_58866_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e726/11997033/99604c755878/41467_2025_58866_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e726/11997033/b0f8efdf2ba6/41467_2025_58866_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e726/11997033/99604c755878/41467_2025_58866_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e726/11997033/b0f8efdf2ba6/41467_2025_58866_Fig2_HTML.jpg

相似文献

1
DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis.DIA-BERT:用于增强DIA蛋白质组学数据分析的预训练端到端Transformer模型。
Nat Commun. 2025 Apr 14;16(1):3530. doi: 10.1038/s41467-025-58866-4.
2
Reproducibility, Specificity and Accuracy of Relative Quantification Using Spectral Library-based Data-independent Acquisition.基于谱库的非依赖数据采集的相对定量的重现性、特异性和准确性。
Mol Cell Proteomics. 2020 Jan;19(1):181-197. doi: 10.1074/mcp.RA119.001714. Epub 2019 Nov 7.
3
Data-Independent Acquisition: A Milestone and Prospect in Clinical Mass Spectrometry-Based Proteomics.数据非依赖采集:临床质谱蛋白质组学的里程碑和展望。
Mol Cell Proteomics. 2024 Aug;23(8):100800. doi: 10.1016/j.mcpro.2024.100800. Epub 2024 Jun 15.
4
Phenotype Classification using Proteome Data in a Data-Independent Acquisition Tensor Format.基于无信息采集张量格式的蛋白质组数据进行表型分类。
J Am Soc Mass Spectrom. 2020 Nov 4;31(11):2296-2304. doi: 10.1021/jasms.0c00254. Epub 2020 Oct 26.
5
A Comparative Analysis of Data Analysis Tools for Data-Independent Acquisition Mass Spectrometry.基于数据非依赖性采集的质谱数据分析工具的比较分析
Mol Cell Proteomics. 2023 Sep;22(9):100623. doi: 10.1016/j.mcpro.2023.100623. Epub 2023 Jul 21.
6
SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics.SeFilter-DIA:用于筛选数据非依赖型采集蛋白质组学中高可信度肽段的挤压激励网络。
Interdiscip Sci. 2024 Sep;16(3):579-592. doi: 10.1007/s12539-024-00611-4. Epub 2024 Mar 12.
7
A New Evaluation Metric for Quantitative Accuracy of LC-MS/MS-Based Proteomics with Data-Independent Acquisition.基于数据非依赖采集的 LC-MS/MS 蛋白质组学定量准确性的新评估指标
J Proteome Res. 2024 Sep 6;23(9):3780-3790. doi: 10.1021/acs.jproteome.4c00088. Epub 2024 Aug 28.
8
iDIA-QC: AI-empowered data-independent acquisition mass spectrometry-based quality control.iDIA-QC:基于人工智能的非数据依赖采集质谱的质量控制
Nat Commun. 2025 Jan 21;16(1):892. doi: 10.1038/s41467-024-54871-1.
9
MSSort-DIA: A deep learning classification tool of the peptide precursors quantified by OpenSWATH.MSSort-DIA:一种基于深度学习的 OpenSWATH 定量肽前体分类工具。
J Proteomics. 2022 May 15;259:104542. doi: 10.1016/j.jprot.2022.104542. Epub 2022 Feb 26.
10
Acquiring and Analyzing Data Independent Acquisition Proteomics Experiments without Spectrum Libraries.无谱库的独立采集蛋白质组学实验的数据获取与分析。
Mol Cell Proteomics. 2020 Jul;19(7):1088-1103. doi: 10.1074/mcp.P119.001913. Epub 2020 Apr 20.

本文引用的文献

1
Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023.基于 DIA 的蛋白质组学数据的获取与分析:2023 年全面综述。
Mol Cell Proteomics. 2024 Feb;23(2):100712. doi: 10.1016/j.mcpro.2024.100712. Epub 2024 Jan 3.
2
DPHL v.2: An updated and comprehensive DIA pan-human assay library for quantifying more than 14,000 proteins.DPHL v.2:一个经过更新的综合性DIA全人类分析文库,用于定量分析超过14000种蛋白质。
Patterns (N Y). 2023 Jul 5;4(7):100792. doi: 10.1016/j.patter.2023.100792. eCollection 2023 Jul 14.
3
A Comparative Analysis of Data Analysis Tools for Data-Independent Acquisition Mass Spectrometry.
基于数据非依赖性采集的质谱数据分析工具的比较分析
Mol Cell Proteomics. 2023 Sep;22(9):100623. doi: 10.1016/j.mcpro.2023.100623. Epub 2023 Jul 21.
4
Accurate Label-Free Quantification by directLFQ to Compare Unlimited Numbers of Proteomes.通过直接 LFQ 进行准确的无标记定量,比较无限数量的蛋白质组。
Mol Cell Proteomics. 2023 Jul;22(7):100581. doi: 10.1016/j.mcpro.2023.100581. Epub 2023 May 22.
5
Simulation of mass spectrometry-based proteomics data with Synthedia.使用Synthedia对基于质谱的蛋白质组学数据进行模拟。
Bioinform Adv. 2022 Dec 19;3(1):vbac096. doi: 10.1093/bioadv/vbac096. eCollection 2023.
6
Benchmarking commonly used software suites and analysis workflows for DIA proteomics and phosphoproteomics.用于 DIA 蛋白质组学和磷酸化蛋白质组学的常用软件套件和分析工作流程的基准测试。
Nat Commun. 2023 Jan 6;14(1):94. doi: 10.1038/s41467-022-35740-1.
7
Advances in data-independent acquisition mass spectrometry towards comprehensive digital proteome landscape.非数据依赖采集质谱技术在构建全面数字蛋白质组图谱方面的进展
Mass Spectrom Rev. 2023 Nov-Dec;42(6):2324-2348. doi: 10.1002/mas.21781. Epub 2022 May 29.
8
Alpha-Tri: a deep neural network for scoring the similarity between predicted and measured spectra improves peptide identification of DIA data.Alpha-Tri:一种用于评分预测谱和实测谱之间相似度的深度神经网络,可提高 DIA 数据的肽鉴定。
Bioinformatics. 2022 Mar 4;38(6):1525-1531. doi: 10.1093/bioinformatics/btab878.
9
Computational Optimization of Spectral Library Size Improves DIA-MS Proteome Coverage and Applications to 15 Tumors.计算优化谱库大小可提高 DIA-MS 蛋白质组覆盖度及在 15 种肿瘤中的应用。
J Proteome Res. 2021 Dec 3;20(12):5392-5401. doi: 10.1021/acs.jproteome.1c00640. Epub 2021 Nov 8.
10
Deep representation features from DreamDIA improve the analysis of data-independent acquisition proteomics.深度表示特征从 DreamDIA 提高了数据非依赖采集蛋白质组学的分析。
Commun Biol. 2021 Oct 14;4(1):1190. doi: 10.1038/s42003-021-02726-6.