Brusniak Mi-Youn, Bodenmiller Bernd, Campbell David, Cooke Kelly, Eddes James, Garbutt Andrew, Lau Hollis, Letarte Simon, Mueller Lukas N, Sharma Vagisha, Vitek Olga, Zhang Ning, Aebersold Ruedi, Watts Julian D
Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103, USA.
BMC Bioinformatics. 2008 Dec 16;9:542. doi: 10.1186/1471-2105-9-542.
Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.
We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.
The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.
定量蛋白质组学在鉴定代表不同生理或疾病状态的群体之间差异丰富的蛋白质方面具有巨大潜力。现在有一系列计算工具可用于基于同位素标记和无标记液相色谱质谱(LC-MS)的定量蛋白质组学。然而,它们在功能、用户界面、信息输入/输出方面通常无法相互比较,并且不利于进行适当的统计数据分析。这些限制以及众多选择,给希望使用基于LC-MS的定量蛋白质组学的生物学家和其他未接受生物信息学培训的研究人员带来了艰巨的前景。
我们开发了Corra,这是一个用于基于发现的LC-MS蛋白质组学的计算框架和工具。Corra扩展并改编了用于基于LC-MS的蛋白质组学的现有算法,以及最初为微阵列数据分析开发的适用于LC-MS数据分析的统计算法。Corra还采用了软件工程技术(如谷歌网络工具包、分布式处理),以便计算密集型数据处理和统计分析可以在远程服务器上运行,而用户可以通过简单的网络界面从自己的计算机控制和管理该过程。Corra还允许用户以与随后通过串联质谱(MS/MS)进行序列鉴定兼容的形式输出LC-MS检测到的显著差异丰富的肽特征。我们展示了两个案例研究,以说明Corra在基于LC-MS的常见生物工作流程中的应用:一项从与2型糖尿病相关的人血浆样本中分离糖蛋白的先导生物标志物发现研究,以及一项在酵母中通过磷酸肽谱分析鉴定蛋白激酶Ark1体内靶点的研究。
Corra计算框架利用计算创新,使生物学家或其他研究人员能够使用原本复杂且不便于用户使用的工具套件来处理、分析和可视化LC-MS数据。Corra能够进行适当的统计分析,并控制错误发现率,最终为随后通过MS/MS进行差异丰富肽的靶向鉴定提供信息。对于未接受生物信息学培训的用户,Corra代表了一个完整、可定制、免费且开源的计算平台,可实现基于LC-MS的蛋白质组学工作流程,因此满足了LC-MS蛋白质组学领域未满足的需求。