基于肽段的 LC-MS 蛋白质组学数据的质量控制处理的改进。

Improved quality control processing of peptide-centric LC-MS proteomics data.

机构信息

Pacific Northwest National Laboratory, Richland, WA 99352, USA.

出版信息

Bioinformatics. 2011 Oct 15;27(20):2866-72. doi: 10.1093/bioinformatics/btr479. Epub 2011 Aug 18.

DOI:10.1093/bioinformatics/btr479

PMID:21852304

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3187650/

Abstract

MOTIVATION

In the analysis of differential peptide peak intensities (i.e. abundance measures), LC-MS analyses with poor quality peptide abundance data can bias downstream statistical analyses and hence the biological interpretation for an otherwise high-quality dataset. Although considerable effort has been placed on assuring the quality of the peptide identification with respect to spectral processing, to date quality assessment of the subsequent peptide abundance data matrix has been limited to a subjective visual inspection of run-by-run correlation or individual peptide components. Identifying statistical outliers is a critical step in the processing of proteomics data as many of the downstream statistical analyses [e.g. analysis of variance (ANOVA)] rely upon accurate estimates of sample variance, and their results are influenced by extreme values.

RESULTS

We describe a novel multivariate statistical strategy for the identification of LC-MS runs with extreme peptide abundance distributions. Comparison with current method (run-by-run correlation) demonstrates a significantly better rate of identification of outlier runs by the multivariate strategy. Simulation studies also suggest that this strategy significantly outperforms correlation alone in the identification of statistically extreme liquid chromatography-mass spectrometry (LC-MS) runs.

AVAILABILITY

https://www.biopilot.org/docs/Software/RMD.php

CONTACT

bj@pnl.gov

SUPPLEMENTARY INFORMATION

Supplementary material is available at Bioinformatics online.

摘要

动机

在分析差异肽峰强度（即丰度测量）时，LC-MS 分析中肽丰度数据质量较差会偏向下游统计分析，从而影响原本高质量数据集的生物学解释。尽管已经在保证肽鉴定质量方面付出了相当大的努力，但是到目前为止，对后续肽丰度数据矩阵的质量评估仅限于对逐个运行的相关性或单个肽成分的主观视觉检查。在处理蛋白质组学数据时，识别统计异常值是一个关键步骤，因为许多下游统计分析（例如方差分析（ANOVA））都依赖于对样本方差的准确估计，并且它们的结果受到极值的影响。

结果

我们描述了一种用于识别具有极端肽丰度分布的 LC-MS 运行的新的多变量统计策略。与当前方法（运行间相关性）的比较表明，多变量策略显著提高了异常运行的识别率。模拟研究还表明，该策略在识别统计学上极端的液相色谱-质谱（LC-MS）运行方面明显优于仅相关性。

可用性

https://www.biopilot.org/docs/Software/RMD.php

联系人

bj@pnl.gov

补充信息

补充材料可在 Bioinformatics 在线获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于肽段的 LC-MS 蛋白质组学数据的质量控制处理的改进。

Improved quality control processing of peptide-centric LC-MS proteomics data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性

联系人

补充信息

相似文献

引用本文的文献

本文引用的文献

基于肽段的 LC-MS 蛋白质组学数据的质量控制处理的改进。

Improved quality control processing of peptide-centric LC-MS proteomics data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性

联系人

补充信息

相似文献

引用本文的文献

本文引用的文献