Suppr超能文献

在大规模群体蛋白质组学中检测差异蛋白质表达

Detecting differential protein expression in large-scale population proteomics.

作者信息

Ryu So Young, Qian Wei-Jun, Camp David G, Smith Richard D, Tompkins Ronald G, Davis Ronald W, Xiao Wenzhong

机构信息

Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.

Stanford Genome Technology Center, Stanford University, Stanford, CA 94305, USA, Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352, USA and Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.

出版信息

Bioinformatics. 2014 Oct;30(19):2741-6. doi: 10.1093/bioinformatics/btu341. Epub 2014 Jun 12.

Abstract

MOTIVATION

Mass spectrometry (MS)-based high-throughput quantitative proteomics shows great potential in large-scale clinical biomarker studies, identifying and quantifying thousands of proteins in biological samples. However, there are unique challenges in analyzing the quantitative proteomics data. One issue is that the quantification of a given peptide is often missing in a subset of the experiments, especially for less abundant peptides. Another issue is that different MS experiments of the same study have significantly varying numbers of peptides quantified, which can result in more missing peptide abundances in an experiment that has a smaller total number of quantified peptides. To detect as many biomarker proteins as possible, it is necessary to develop bioinformatics methods that appropriately handle these challenges.

RESULTS

We propose a Significance Analysis for Large-scale Proteomics Studies (SALPS) that handles missing peptide intensity values caused by the two mechanisms mentioned above. Our model has a robust performance in both simulated data and proteomics data from a large clinical study. Because varying patients' sample qualities and deviating instrument performances are not avoidable for clinical studies performed over the course of several years, we believe that our approach will be useful to analyze large-scale clinical proteomics data.

AVAILABILITY AND IMPLEMENTATION

R codes for SALPS are available at http://www.stanford.edu/%7eclairesr/software.html.

摘要

动机

基于质谱(MS)的高通量定量蛋白质组学在大规模临床生物标志物研究中显示出巨大潜力,可对生物样品中的数千种蛋白质进行鉴定和定量。然而,在分析定量蛋白质组学数据时存在独特的挑战。一个问题是,在一部分实验中,给定肽段的定量往往缺失,尤其是对于丰度较低的肽段。另一个问题是,同一研究的不同质谱实验中定量的肽段数量差异很大,这可能导致在定量肽段总数较少的实验中出现更多缺失的肽段丰度。为了尽可能多地检测生物标志物蛋白质,有必要开发能够适当应对这些挑战的生物信息学方法。

结果

我们提出了一种用于大规模蛋白质组学研究的显著性分析(SALPS)方法,该方法可处理由上述两种机制导致的缺失肽段强度值。我们的模型在模拟数据和来自一项大型临床研究的蛋白质组学数据中均具有稳健的性能。由于在数年的临床研究中,患者样本质量的差异和仪器性能的偏差是不可避免的,我们相信我们的方法将有助于分析大规模临床蛋白质组学数据。

可用性和实现方式

SALPS的R代码可在http://www.stanford.edu/%7eclairesr/software.html获取。

相似文献

引用本文的文献

本文引用的文献

5
A genomic storm in critically injured humans.危重症患者的基因组风暴。
J Exp Med. 2011 Dec 19;208(13):2581-90. doi: 10.1084/jem.20111354. Epub 2011 Nov 21.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验