Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts 02115, United States.
Institute of Mathematics, University of Wrocław, Wrocław 50-384, Poland.
J Proteome Res. 2023 Aug 4;22(8):2641-2659. doi: 10.1021/acs.jproteome.3c00155. Epub 2023 Jul 19.
Repeated measures experimental designs, which quantify proteins in biological subjects repeatedly over multiple experimental conditions or times, are commonly used in mass spectrometry-based proteomics. Such designs distinguish the biological variation within and between the subjects and increase the statistical power of detecting within-subject changes in protein abundance. Meanwhile, proteomics experiments increasingly incorporate tandem mass tag (TMT) labeling, a multiplexing strategy that gains both relative protein quantification accuracy and sample throughput. However, combining repeated measures and TMT multiplexing in a large-scale investigation presents statistical challenges due to unique interplays of between-mixture, within-mixture, between-subject, and within-subject variation. This manuscript proposes a family of linear mixed-effects models for differential analysis of proteomics experiments with repeated measures and TMT multiplexing. These models decompose the variation in the data into the contributions from its sources as appropriate for the specifics of each experiment, enable statistical inference of differential protein abundance, and recognize a difference in the uncertainty of between-subject versus within-subject comparisons. The proposed family of models is implemented in the R/Bioconductor package MSstatsTMT v2.2.0. Evaluations of four simulated datasets and four investigations answering diverse biological questions demonstrated the value of this approach as compared to the existing general-purpose approaches and implementations.
重复测量实验设计,即在多个实验条件或时间点上反复定量生物学样本中的蛋白质,常用于基于质谱的蛋白质组学研究中。这种设计可以区分样本内和样本间的生物学变异,并提高检测蛋白质丰度在样本内变化的统计能力。同时,蛋白质组学实验越来越多地采用串联质量标签(TMT)标记,这是一种多重化策略,可同时提高相对蛋白质定量准确性和样本通量。然而,由于混合间、混合内、样本间和样本内变异之间的独特相互作用,在大规模研究中结合重复测量和 TMT 多重化会带来统计学挑战。本文提出了一系列适用于具有重复测量和 TMT 多重化的蛋白质组学实验的线性混合效应模型,用于差异分析。这些模型将数据中的变异分解为其来源的贡献,适用于每个实验的具体情况,能够对差异蛋白质丰度进行统计推断,并认识到样本间比较与样本内比较不确定性的差异。所提出的模型系列在 R/Bioconductor 包 MSstatsTMT v2.2.0 中实现。对四个模拟数据集和四个回答不同生物学问题的研究的评估表明,与现有的通用方法和实现相比,这种方法具有价值。