• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MultiPro:具有故意批次效应的 DDA-PASEF 和 diaPASEF 采集细胞系蛋白质组学数据集。

MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects.

机构信息

Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore.

School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore.

出版信息

Sci Data. 2023 Dec 2;10(1):858. doi: 10.1038/s41597-023-02779-8.

DOI:10.1038/s41597-023-02779-8
PMID:38042886
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10693559/
Abstract

Mass spectrometry-based proteomics plays a critical role in current biological and clinical research. Technical issues like data integration, missing value imputation, batch effect correction and the exploration of inter-connections amongst these technical issues, can produce errors but are not well studied. Although proteomic technologies have improved significantly in recent years, this alone cannot resolve these issues. What is needed are better algorithms and data processing knowledge. But to obtain these, we need appropriate proteomics datasets for exploration, investigation, and benchmarking. To meet this need, we developed MultiPro (Multi-purpose Proteome Resource), a resource comprising four comprehensive large-scale proteomics datasets with deliberate batch effects using the latest parallel accumulation-serial fragmentation in both Data-Dependent Acquisition (DDA) and Data Independent Acquisition (DIA) modes. Each dataset contains a balanced two-class design based on well-characterized and widely studied cell lines (A549 vs K562 or HCC1806 vs HS578T) with 48 or 36 biological and technical replicates altogether, allowing for investigation of a multitude of technical issues. These datasets allow for investigation of inter-connections between class and batch factors, or to develop approaches to compare and integrate data from DDA and DIA platforms.

摘要

基于质谱的蛋白质组学在当前的生物和临床研究中起着至关重要的作用。数据集成、缺失值插补、批次效应校正等技术问题,以及这些技术问题之间的相互关系的探索,可能会产生错误,但尚未得到很好的研究。尽管近年来蛋白质组学技术有了显著的改进,但仅凭这一点并不能解决这些问题。需要更好的算法和数据处理知识。但是,要获得这些知识,我们需要探索、调查和基准测试的适当蛋白质组学数据集。为了满足这一需求,我们开发了 MultiPro(多功能蛋白质组资源),这是一个资源,包含四个综合的大规模蛋白质组数据集,使用最新的平行积累-串联碎裂在 Data-Dependent Acquisition (DDA) 和 Data Independent Acquisition (DIA) 模式下都有故意的批次效应。每个数据集都包含基于特征良好且广泛研究的细胞系(A549 与 K562 或 HCC1806 与 HS578T)的平衡两分类设计,共有 48 或 36 个生物学和技术重复,可用于研究多种技术问题。这些数据集允许研究类和批次因素之间的相互关系,或开发方法来比较和整合 DDA 和 DIA 平台的数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e429/10693559/0a12aa6c9760/41597_2023_2779_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e429/10693559/1e20bcba90c9/41597_2023_2779_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e429/10693559/8b57db9039d9/41597_2023_2779_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e429/10693559/f478daa9eda8/41597_2023_2779_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e429/10693559/0a12aa6c9760/41597_2023_2779_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e429/10693559/1e20bcba90c9/41597_2023_2779_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e429/10693559/8b57db9039d9/41597_2023_2779_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e429/10693559/f478daa9eda8/41597_2023_2779_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e429/10693559/0a12aa6c9760/41597_2023_2779_Fig4_HTML.jpg

相似文献

1
MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects.MultiPro:具有故意批次效应的 DDA-PASEF 和 diaPASEF 采集细胞系蛋白质组学数据集。
Sci Data. 2023 Dec 2;10(1):858. doi: 10.1038/s41597-023-02779-8.
2
Proteomic datasets of HeLa and SiHa cell lines acquired by DDA-PASEF and diaPASEF.通过数据依赖采集-并行累积连续碎裂(DDA-PASEF)和数据非依赖采集-并行累积连续碎裂(diaPASEF)获得的HeLa和SiHa细胞系蛋白质组数据集。
Data Brief. 2022 Feb 4;41:107919. doi: 10.1016/j.dib.2022.107919. eCollection 2022 Apr.
3
Four-dimensional proteomics analysis of human cerebrospinal fluid with trapped ion mobility spectrometry using PASEF.采用 PASEF 的基于囚禁离子淌度谱的人脑脊液的四维蛋白质组学分析。
Proteomics. 2023 May;23(10):e2200507. doi: 10.1002/pmic.202200507. Epub 2023 Feb 19.
4
Proper imputation of missing values in proteomics datasets for differential expression analysis.蛋白质组学数据集缺失值的恰当推断用于差异表达分析。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa112.
5
A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics.基于现代蛋白质组学采集策略的全面 LFQ 基准数据集。
Sci Data. 2022 Mar 30;9(1):126. doi: 10.1038/s41597-022-01216-6.
6
Investigation of Effects of the Spectral Library on Analysis of diaPASEF Data.光谱库对diaPASEF数据分析的影响研究
J Proteome Res. 2022 Feb 4;21(2):507-518. doi: 10.1021/acs.jproteome.1c00899. Epub 2021 Dec 31.
7
Evaluation of DDA Library-Free Strategies for Phosphoproteomics and Ubiquitinomics Data-Independent Acquisition Data.用于磷酸化蛋白质组学和泛素化蛋白质组学数据非依赖型采集数据的无数据依赖采集(DDA)文库策略评估
J Proteome Res. 2023 Jul 7;22(7):2232-2245. doi: 10.1021/acs.jproteome.2c00735. Epub 2023 May 31.
8
Increasing taxonomic and functional characterization of host-microbiome interactions by DIA-PASEF metaproteomics.通过数据独立采集-并行累积连续碎裂(DIA-PASEF)宏蛋白质组学增强宿主-微生物组相互作用的分类学和功能表征。
Front Microbiol. 2023 Oct 16;14:1258703. doi: 10.3389/fmicb.2023.1258703. eCollection 2023.
9
Characterization of Cerebrospinal Fluid via Data-Independent Acquisition Mass Spectrometry.通过数据非依赖性采集质谱技术对脑脊液进行特征分析。
J Proteome Res. 2018 Oct 5;17(10):3418-3430. doi: 10.1021/acs.jproteome.8b00308. Epub 2018 Sep 12.
10
DIA-Based Proteome Profiling of Nasopharyngeal Swabs from COVID-19 Patients.基于 DIA 的 COVID-19 患者鼻咽拭子的蛋白质组谱分析。
J Proteome Res. 2021 Aug 6;20(8):4165-4175. doi: 10.1021/acs.jproteome.1c00506. Epub 2021 Jul 22.

引用本文的文献

1
Progress and trends on machine learning in proteomics during 1997-2024: a bibliometric analysis.1997 - 2024年蛋白质组学中机器学习的进展与趋势:文献计量分析
Front Med (Lausanne). 2025 Aug 15;12:1594442. doi: 10.3389/fmed.2025.1594442. eCollection 2025.
2
Quantitative proteomics unveils potential plasma biomarkers and provides insights into the pathophysiological mechanisms underlying equine metabolic syndrome.定量蛋白质组学揭示了潜在的血浆生物标志物,并为马代谢综合征的病理生理机制提供了见解。
BMC Vet Res. 2025 Jul 2;21(1):425. doi: 10.1186/s12917-025-04879-6.
3
MSFragger-DDA+ enhances peptide identification sensitivity with full isolation window search.

本文引用的文献

1
MSBooster: improving peptide identification rates using deep learning-based features.MSBooster:基于深度学习的特征提高肽段鉴定率。
Nat Commun. 2023 Jul 27;14(1):4539. doi: 10.1038/s41467-023-40129-9.
2
Proteomic Dynamics of Breast Cancer Cell Lines Identifies Potential Therapeutic Protein Targets.乳腺癌细胞系蛋白质组动态变化分析鉴定潜在的治疗性蛋白靶标。
Mol Cell Proteomics. 2023 Aug;22(8):100602. doi: 10.1016/j.mcpro.2023.100602. Epub 2023 Jun 19.
3
The importance of batch sensitization in missing value imputation.批次敏感化在缺失值插补中的重要性。
MSFragger-DDA+ 通过全隔离窗口搜索提高了肽段鉴定的灵敏度。
Nat Commun. 2025 Apr 8;16(1):3329. doi: 10.1038/s41467-025-58728-z.
4
Spatial Proteomics by Parallel Accumulation-Serial Fragmentation Supported MALDI MS/MS Imaging: A First Glance Into Multiplexed and Spatial Peptide Identification.基于平行累积-串联碎裂支持的基质辅助激光解吸/电离质谱/质谱成像的空间蛋白质组学:对多重和空间肽鉴定的初步探索。
Rapid Commun Mass Spectrom. 2025 May 15;39(9):e10006. doi: 10.1002/rcm.10006.
5
Similar, but not the same: multiomics comparison of human valve interstitial cells and osteoblast osteogenic differentiation expanded with an estimation of data-dependent and data-independent PASEF proteomics.相似但不同:人类瓣膜间质细胞和成骨细胞成骨分化的多组学比较,并对数据依赖和数据独立的PASEF蛋白质组学进行评估
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giae110.
6
MSFragger-DDA+ Enhances Peptide Identification Sensitivity with Full Isolation Window Search.MSFragger-DDA+通过全隔离窗口搜索提高肽段鉴定灵敏度。
bioRxiv. 2024 Oct 15:2024.10.12.618041. doi: 10.1101/2024.10.12.618041.
7
Thinking points for effective batch correction on biomedical data.生物医学数据有效批量校正的思考要点。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae515.
8
ProPept-MT: A Multi-Task Learning Model for Peptide Feature Prediction.ProPept-MT:用于肽段特征预测的多任务学习模型。
Int J Mol Sci. 2024 Jun 30;25(13):7237. doi: 10.3390/ijms25137237.
Sci Rep. 2023 Feb 21;13(1):3003. doi: 10.1038/s41598-023-30084-2.
4
BIRCH: An Automated Workflow for Evaluation, Correction, and Visualization of Batch Effect in Bottom-Up Mass Spectrometry-Based Proteomics Data.BIRCH:一种用于 Bottom-Up 质谱蛋白质组学数据中批处理效应评估、校正和可视化的自动化工作流程。
J Proteome Res. 2023 Feb 3;22(2):471-481. doi: 10.1021/acs.jproteome.2c00671. Epub 2023 Jan 25.
5
Dealing with missing values in proteomics data.处理蛋白质组学数据中的缺失值。
Proteomics. 2022 Dec;22(23-24):e2200092. doi: 10.1002/pmic.202200092. Epub 2022 Nov 17.
6
High-throughput proteomic sample preparation using pressure cycling technology.高通量蛋白质组学样品制备技术——压力循环技术。
Nat Protoc. 2022 Oct;17(10):2307-2325. doi: 10.1038/s41596-022-00727-1. Epub 2022 Aug 5.
7
Pan-cancer proteomic map of 949 human cell lines.949 个人类细胞系的泛癌症蛋白质组图谱。
Cancer Cell. 2022 Aug 8;40(8):835-849.e8. doi: 10.1016/j.ccell.2022.06.010. Epub 2022 Jul 14.
8
dia-PASEF data analysis using FragPipe and DIA-NN for deep proteomics of low sample amounts.使用 FragPipe 和 DIA-NN 对低样本量进行深度蛋白质组学分析的 dia-PASEF 数据分析。
Nat Commun. 2022 Jul 8;13(1):3944. doi: 10.1038/s41467-022-31492-0.
9
Noninvasive proteomic biomarkers for alcohol-related liver disease.用于酒精性肝病的非侵入性蛋白质组学生物标志物。
Nat Med. 2022 Jun;28(6):1277-1287. doi: 10.1038/s41591-022-01850-y. Epub 2022 Jun 2.
10
The emerging role of mass spectrometry-based proteomics in drug discovery.基于质谱的蛋白质组学在药物发现中的新作用。
Nat Rev Drug Discov. 2022 Sep;21(9):637-654. doi: 10.1038/s41573-022-00409-3. Epub 2022 Mar 29.