Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
Mol Cell Proteomics. 2023 Jul;22(7):100581. doi: 10.1016/j.mcpro.2023.100581. Epub 2023 May 22.
Recent advances in mass spectrometry-based proteomics enable the acquisition of increasingly large datasets within relatively short times, which exposes bottlenecks in the bioinformatics pipeline. Although peptide identification is already scalable, most label-free quantification (LFQ) algorithms scale quadratic or cubic with the sample numbers, which may even preclude the analysis of large-scale data. Here we introduce directLFQ, a ratio-based approach for sample normalization and the calculation of protein intensities. It estimates quantities via aligning samples and ion traces by shifting them on top of each other in logarithmic space. Importantly, directLFQ scales linearly with the number of samples, allowing analyses of large studies to finish in minutes instead of days or months. We quantify 10,000 proteomes in 10 min and 100,000 proteomes in less than 2 h, a 1000-fold faster than some implementations of the popular LFQ algorithm MaxLFQ. In-depth characterization of directLFQ reveals excellent normalization properties and benchmark results, comparing favorably to MaxLFQ for both data-dependent acquisition and data-independent acquisition. In addition, directLFQ provides normalized peptide intensity estimates for peptide-level comparisons. It is an important part of an overall quantitative proteomic pipeline that also needs to include high sensitive statistical analysis leading to proteoform resolution. Available as an open-source Python package and a graphical user interface with a one-click installer, it can be used in the AlphaPept ecosystem as well as downstream of most common computational proteomics pipelines.
基于质谱的蛋白质组学的最新进展使得在相对较短的时间内能够获得越来越大的数据集,这暴露了生物信息学管道中的瓶颈。尽管肽鉴定已经具有可扩展性,但大多数无标记定量(LFQ)算法的规模与样品数量呈二次或三次方关系,这甚至可能排除大规模数据的分析。在这里,我们介绍了 directLFQ,这是一种基于比率的方法,用于样品归一化和蛋白质强度的计算。它通过在对数空间中将样品和离子轨迹相互对齐并进行移位来估计数量。重要的是,directLFQ 的规模与样品数量呈线性关系,允许在几分钟内完成大规模研究的分析,而不是几天或几个月。我们在 10 分钟内定量了 10000 个蛋白质组,在不到 2 小时内定量了 100000 个蛋白质组,比一些流行的 LFQ 算法 MaxLFQ 的实现速度快 1000 倍。对 directLFQ 的深入特征分析揭示了出色的归一化特性和基准结果,与 MaxLFQ 相比,无论是在数据依赖采集还是数据独立采集方面,它都具有优势。此外,directLFQ 还为肽水平比较提供了归一化肽强度估计。它是一个整体定量蛋白质组学管道的重要组成部分,还需要包括能够实现蛋白质形式分辨率的高灵敏度统计分析。它作为一个开源的 Python 包和一个带有一键安装器的图形用户界面提供,可在 AlphaPept 生态系统中以及大多数常见计算蛋白质组学管道的下游使用。