Suppr超能文献

Ibaqpy:一个用于蛋白质组学中利用SDRF元数据进行基线定量的可扩展Python软件包。

Ibaqpy: A scalable Python package for baseline quantification in proteomics leveraging SDRF metadata.

作者信息

Zheng Ping, Audain Enrique, Webel Henry, Dai Chengxin, Klein Joshua, Hitz Marc-Phillip, Sachsenberg Timo, Bai Mingze, Perez-Riverol Yasset

机构信息

Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China.

Institute of Medical Genetics, University Medicine Oldenburg, Carl von Ossietzky University, Oldenburg, Germany.

出版信息

J Proteomics. 2025 Jun 15;317:105440. doi: 10.1016/j.jprot.2025.105440. Epub 2025 Apr 21.

Abstract

Intensity-based absolute quantification (iBAQ) is essential in proteomics as it allows for the assessment of a protein's absolute abundance in various samples or conditions. However, the computation of these values for increasingly large-scale and high-throughput experiments, such as those using DIA, TMT, or LFQ workflows, poses significant challenges in scalability and reproducibility. Here, we present ibaqpy (https://github.com/bigbio/ibaqpy), a Python package designed to compute iBAQ values efficiently for experiments of any scale. Ibaqpy leverages the Sample and Data Relationship Format (SDRF) metadata standard to incorporate experimental metadata into the quantification workflow. This allows for automatic normalization and batch correction while accounting for key aspects of the experimental design, such as technical and biological replicates, fractionation strategies, and sample conditions. Designed for large-scale proteomics datasets, ibaqpy can also recompute iBAQ values for existing experiments when an SDRF is available. We showcased ibaqpy's capabilities by reanalyzing 17 public proteomics datasets from ProteomeXchange, covering HeLa cell lines with 4921 samples and 5766 MS runs, quantifying a total of 11,014 proteins. In our reanalysis, ibaqpy is a key component in automating reproducible quantification, reducing manual effort and making quantitative proteomics more accessible while supporting FAIR principles for data reuse. SIGNIFICANCE: Proteomics studies often rely on intensity-based absolute quantification (iBAQ) to assess protein abundance across various biological conditions. Despite its widespread use, computing iBAQ values at scale remains challenging due to the increasing complexity and volume of proteomics experiments. Existing tools frequently lack metadata integration, limiting their ability to handle experimental design intricacies such as replicates, fractions, and batch effects. Our work introduces ibaqpy, a scalable Python package that leverages the Sample and Data Relationship Format (SDRF) to compute iBAQ values efficiently while incorporating critical experimental metadata. By enabling automated normalization and batch correction, ibaqpy ensures reproducible and comparable quantification across large-scale datasets. We validated the utility of ibaqpy through the reanalysis of 17 public HeLa datasets, comprising over 200 million peptide features and quantifying 11,000 proteins across thousands of samples. This comprehensive reanalysis highlights the robustness and scalability of ibaqpy, making it an essential tool for researchers conducting large-scale proteomics experiments. Moreover, by promoting FAIR principles for data reuse and interoperability, ibaqpy offers a transformative approach to baseline protein quantification, supporting reproducible research and data integration within the proteomics community.

摘要

基于强度的绝对定量(iBAQ)在蛋白质组学中至关重要,因为它能够评估蛋白质在各种样品或条件下的绝对丰度。然而,对于越来越大规模和高通量的实验,如使用数据独立采集(DIA)、串联质谱标签(TMT)或无标记定量(LFQ)工作流程的实验,计算这些值在可扩展性和可重复性方面带来了重大挑战。在此,我们展示ibaqpy(https://github.com/bigbio/ibaqpy),一个Python软件包,旨在为任何规模的实验高效计算iBAQ值。Ibaqpy利用样品和数据关系格式(SDRF)元数据标准将实验元数据纳入定量工作流程。这允许在考虑实验设计的关键方面(如技术和生物学重复、分级策略和样品条件)的同时进行自动归一化和批次校正。专为大规模蛋白质组学数据集设计,当有SDRF时,ibaqpy还可以为现有实验重新计算iBAQ值。我们通过重新分析来自蛋白质组交换库的17个公共蛋白质组学数据集展示了ibaqpy的能力,这些数据集涵盖了具有4921个样品和5766次质谱运行的HeLa细胞系,共定量了11014种蛋白质。在我们的重新分析中,ibaqpy是实现可重复定量自动化的关键组件,减少了人工工作量,使定量蛋白质组学更易于使用,同时支持数据重用的公平原则。意义:蛋白质组学研究通常依赖基于强度的绝对定量(iBAQ)来评估各种生物学条件下的蛋白质丰度。尽管其广泛使用,但由于蛋白质组学实验的复杂性和数据量不断增加,大规模计算iBAQ值仍然具有挑战性。现有工具常常缺乏元数据整合,限制了它们处理诸如重复、分级和批次效应等实验设计复杂性的能力。我们的工作引入了ibaqpy,一个可扩展的Python软件包,它利用样品和数据关系格式(SDRF)来高效计算iBAQ值,同时纳入关键的实验元数据。通过实现自动归一化和批次校正,ibaqpy确保了跨大规模数据集的可重复和可比定量。我们通过重新分析17个公共HeLa数据集验证了ibaqpy的实用性,这些数据集包含超过2亿个肽段特征,跨越数千个样品定量了11000种蛋白质。这种全面的重新分析突出了ibaqpy的稳健性和可扩展性,使其成为进行大规模蛋白质组学实验的研究人员的必备工具。此外,通过促进数据重用和互操作性的公平原则,ibaqpy为基线蛋白质定量提供了一种变革性方法,支持蛋白质组学社区内的可重复研究和数据整合。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验