Suppr超能文献

改善影响无标记蛋白质组学数据中离子电流测量的系统偏差的归一化。

Improved normalization of systematic biases affecting ion current measurements in label-free proteomics data.

作者信息

Rudnick Paul A, Wang Xia, Yan Xinjian, Sedransk Nell, Stein Stephen E

机构信息

Mass Spectrometry Data Center, National Institute of Standards and Technology, Gaithersburg, Maryland;

出版信息

Mol Cell Proteomics. 2014 May;13(5):1341-51. doi: 10.1074/mcp.M113.030593. Epub 2014 Feb 21.

Abstract

Normalization is an important step in the analysis of quantitative proteomics data. If this step is ignored, systematic biases can lead to incorrect assumptions about regulation. Most statistical procedures for normalizing proteomics data have been borrowed from genomics where their development has focused on the removal of so-called 'batch effects.' In general, a typical normalization step in proteomics works under the assumption that most peptides/proteins do not change; scaling is then used to give a median log-ratio of 0. The focus of this work was to identify other factors, derived from knowledge of the variables in proteomics, which might be used to improve normalization. Here we have examined the multi-laboratory data sets from Phase I of the NCI's CPTAC program. Surprisingly, the most important bias variables affecting peptide intensities within labs were retention time and charge state. The magnitude of these observations was exaggerated in samples of unequal concentrations or "spike-in" levels, presumably because the average precursor charge for peptides with higher charge state potentials is lower at higher relative sample concentrations. These effects are consistent with reduced protonation during electrospray and demonstrate that the physical properties of the peptides themselves can serve as good reporters of systematic biases. Between labs, retention time, precursor m/z, and peptide length were most commonly the top-ranked bias variables, over the standardly used average intensity (A). A larger set of variables was then used to develop a stepwise normalization procedure. This statistical model was found to perform as well or better on the CPTAC mock biomarker data than other commonly used methods. Furthermore, the method described here does not require a priori knowledge of the systematic biases in a given data set. These improvements can be attributed to the inclusion of variables other than average intensity during normalization.

摘要

归一化是定量蛋白质组学数据分析中的重要步骤。如果忽略这一步骤,系统偏差可能会导致关于调控的错误假设。大多数用于蛋白质组学数据归一化的统计程序都借鉴自基因组学,在基因组学中,其开发重点在于消除所谓的“批次效应”。一般来说,蛋白质组学中典型的归一化步骤是在大多数肽/蛋白质不变的假设下进行的;然后使用缩放来使中位数对数比为0。这项工作的重点是从蛋白质组学变量知识中识别其他可能用于改进归一化的因素。在这里,我们检查了美国国立癌症研究所(NCI)临床蛋白质组肿瘤分析联盟(CPTAC)项目第一阶段的多实验室数据集。令人惊讶的是,影响实验室内肽强度的最重要偏差变量是保留时间和电荷状态。在浓度不等或“加标”水平的样本中,这些观察结果的影响被放大了,可能是因为在较高的相对样本浓度下,具有较高电荷状态潜力的肽的平均前体电荷较低。这些效应与电喷雾过程中质子化减少一致,并表明肽本身的物理性质可以作为系统偏差的良好指标。在不同实验室之间,保留时间、前体质荷比和肽长度最常是排名靠前的偏差变量,超过了标准使用的平均强度(A)。然后使用更大的一组变量来开发逐步归一化程序。发现这个统计模型在CPTAC模拟生物标志物数据上的表现与其他常用方法一样好或更好。此外,这里描述的方法不需要对给定数据集中的系统偏差有先验知识。这些改进可归因于在归一化过程中纳入了平均强度以外的变量。

相似文献

1
Improved normalization of systematic biases affecting ion current measurements in label-free proteomics data.
Mol Cell Proteomics. 2014 May;13(5):1341-51. doi: 10.1074/mcp.M113.030593. Epub 2014 Feb 21.
5
NormalyzerDE: Online Tool for Improved Normalization of Omics Expression Data and High-Sensitivity Differential Expression Analysis.
J Proteome Res. 2019 Feb 1;18(2):732-740. doi: 10.1021/acs.jproteome.8b00523. Epub 2018 Oct 15.
6
A systematic evaluation of normalization methods in quantitative label-free proteomics.
Brief Bioinform. 2018 Jan 1;19(1):1-11. doi: 10.1093/bib/bbw095.
7
Normalization Method Utilizing Endogenous Proteins for Quantitative Proteomics.
J Am Soc Mass Spectrom. 2020 Jul 1;31(7):1380-1388. doi: 10.1021/jasms.0c00012. Epub 2020 Apr 22.
8
mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry.
J Proteomics. 2015 Nov 3;129:108-120. doi: 10.1016/j.jprot.2015.09.013. Epub 2015 Sep 15.
10
Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition.
Bioinformatics. 2009 Oct 1;25(19):2573-80. doi: 10.1093/bioinformatics/btp426. Epub 2009 Jul 14.

引用本文的文献

1
High-quality and robust protein quantification in large clinical/pharmaceutical cohorts with IonStar proteomics investigation.
Nat Protoc. 2023 Mar;18(3):700-731. doi: 10.1038/s41596-022-00780-w. Epub 2022 Dec 9.
2
Proteomic Interrogation in Cancer Biomarker.
Adv Exp Med Biol. 2021;1187:305-322. doi: 10.1007/978-981-32-9620-6_15.
5
Learning and Imputation for Mass-spec Bias Reduction (LIMBR).
Bioinformatics. 2019 May 1;35(9):1518-1526. doi: 10.1093/bioinformatics/bty828.
6
An Optimized Framework for Cancer Prediction Using Immunosignature.
J Med Signals Sens. 2018 Jul-Sep;8(3):161-169. doi: 10.4103/jmss.JMSS_2_18.
7
Bioinformatics Analysis of Genes and Pathways of CD11b/Ly6C Macrophages after Renal Ischemia-Reperfusion Injury.
Curr Med Sci. 2018 Feb;38(1):70-77. doi: 10.1007/s11596-018-1848-7. Epub 2018 Mar 15.
8
Penicillium echinulatum secretome analysis reveals the fungi potential for degradation of lignocellulosic biomass.
Biotechnol Biofuels. 2016 Mar 17;9:66. doi: 10.1186/s13068-016-0476-3. eCollection 2016.
9
DeMix-Q: Quantification-Centered Data Processing Workflow.
Mol Cell Proteomics. 2016 Apr;15(4):1467-78. doi: 10.1074/mcp.O115.055475. Epub 2016 Jan 4.

本文引用的文献

1
Batch effects correction improves the sensitivity of significance tests in spectral counting-based comparative discovery proteomics.
J Proteomics. 2012 Jul 16;75(13):3938-51. doi: 10.1016/j.jprot.2012.05.005. Epub 2012 May 12.
3
Tackling the widespread and critical impact of batch effects in high-throughput data.
Nat Rev Genet. 2010 Oct;11(10):733-9. doi: 10.1038/nrg2825. Epub 2010 Sep 14.
4
Interlaboratory study characterizing a yeast performance standard for benchmarking LC-MS platform performance.
Mol Cell Proteomics. 2010 Feb;9(2):242-54. doi: 10.1074/mcp.M900222-MCP200. Epub 2009 Oct 26.
5
Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses.
Mol Cell Proteomics. 2010 Feb;9(2):225-41. doi: 10.1074/mcp.M900223-MCP200. Epub 2009 Oct 16.
6
Development and evaluation of normalization methods for label-free relative quantification of endogenous peptides.
Mol Cell Proteomics. 2009 Oct;8(10):2285-95. doi: 10.1074/mcp.M800514-MCP200. Epub 2009 Jul 12.
8
Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data.
BMC Bioinformatics. 2008 Dec 4;9:520. doi: 10.1186/1471-2105-9-520.
9
Accurate inclusion mass screening: a bridge from unbiased discovery to targeted assay development for biomarker verification.
Mol Cell Proteomics. 2008 Oct;7(10):1952-62. doi: 10.1074/mcp.M800218-MCP200. Epub 2008 Jun 4.
10
An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data.
J Proteome Res. 2008 Jan;7(1):51-61. doi: 10.1021/pr700758r. Epub 2008 Jan 4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验