Karpievitch Yuliya V, Polpitiya Ashoka D, Anderson Gordon A, Smith Richard D, Dabney Alan R
Pacific Northwest National Laboratory, Richland, WA 99352.
Ann Appl Stat. 2010;4(4):1797-1823. doi: 10.1214/10-AOAS341.
Mass spectrometry-based proteomics has become the tool of choice for identifying and quantifying the proteome of an organism. Though recent years have seen a tremendous improvement in instrument performance and the computational tools used, significant challenges remain, and there are many opportunities for statisticians to make important contributions. In the most widely used "bottom-up" approach to proteomics, complex mixtures of proteins are first subjected to enzymatic cleavage, the resulting peptide products are separated based on chemical or physical properties and analyzed using a mass spectrometer. The two fundamental challenges in the analysis of bottom-up MS-based proteomics are: (1) Identifying the proteins that are present in a sample, and (2) Quantifying the abundance levels of the identified proteins. Both of these challenges require knowledge of the biological and technological context that gives rise to observed data, as well as the application of sound statistical principles for estimation and inference. We present an overview of bottom-up proteomics and outline the key statistical issues that arise in protein identification and quantification.
基于质谱的蛋白质组学已成为鉴定和定量生物体蛋白质组的首选工具。尽管近年来仪器性能和所使用的计算工具都有了巨大改进,但仍存在重大挑战,统计学家有很多机会做出重要贡献。在蛋白质组学最广泛使用的“自下而上”方法中,首先对复杂的蛋白质混合物进行酶切,然后根据化学或物理性质分离得到的肽产物,并使用质谱仪进行分析。基于自下而上的质谱蛋白质组学分析中的两个基本挑战是:(1)鉴定样品中存在的蛋白质,以及(2)定量鉴定出的蛋白质的丰度水平。这两个挑战都需要了解产生观测数据的生物学和技术背景,以及应用合理的统计原理进行估计和推断。我们概述了自下而上的蛋白质组学,并概述了蛋白质鉴定和定量中出现的关键统计问题。