Carvalho Paulo C, Fischer Juliana S G, Chen Emily I, Yates John R, Barbosa Valmir C
Systems Engineering and Computer Science Program, COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.
BMC Bioinformatics. 2008 Jul 21;9:316. doi: 10.1186/1471-2105-9-316.
A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired.
To address the open issues above, we present a program termed PatternLab for proteomics. This program implements existing strategies and adds two new methods to pinpoint differences in protein profiles. The first method, ACFold, addresses experiments with less than three replicates from each state or having assays acquired by different protocols as described by Chen et al. ACFold uses a combined criterion based on expression fold changes, the AC test, and the false-discovery rate, and can supply a "bird's-eye view" of differentially expressed proteins. The other method addresses experimental designs having multiple readings from each state and is referred to as nSVM (natural support vector machine) because of its roots in evolutionary computing and in statistical learning theory. Our observations suggest that nSVM's niche comprises projects that select a minimum set of proteins for classification purposes; for example, the development of an early detection kit for a given pathology. We demonstrate the effectiveness of each method on experimental data and confront them with existing strategies.
PatternLab offers an easy and unified access to a variety of feature selection and normalization strategies, each having its own niche. Additionally, graphing tools are available to aid in the analysis of high throughput experimental data. PatternLab is available at http://pcarvalho.com/patternlab.
蛋白质组学的一个目标是通过识别蛋白质表达差异来区分生物系统的不同状态。刘等人展示了一种在鸟枪法蛋白质组学数据中进行半相对蛋白质定量的方法,即通过将每种蛋白质获得的串联质谱数量(或“谱图计数”)与其在混合物中的丰度相关联;然而,有两个问题仍未解决:如何对谱图计数数据进行归一化以及如何有效地找出不同图谱之间的差异。此外,陈等人最近展示了如何在进行蛋白水解消化时,通过分析含有不同质谱兼容去污剂的样品来增加鸟枪法蛋白质组学中鉴定出的蛋白质数量。从数据分析的角度来看,后者带来了新的挑战,因为没有获取重复读数。
为了解决上述未解决的问题,我们提出了一个名为PatternLab for proteomics的程序。该程序实施了现有策略,并添加了两种新方法来找出蛋白质图谱中的差异。第一种方法ACFold适用于来自每种状态的重复次数少于三次的实验,或适用于如陈等人所述的通过不同方案进行的测定。ACFold使用基于表达倍数变化、AC检验和错误发现率的综合标准,并且可以提供差异表达蛋白质的“全景图”。另一种方法适用于来自每种状态有多个读数的实验设计,因其起源于进化计算和统计学习理论而被称为nSVM(自然支持向量机)。我们的观察表明,nSVM的适用范围包括为分类目的选择最少蛋白质集的项目;例如,开发针对特定病理学的早期检测试剂盒。我们在实验数据上证明了每种方法的有效性,并将它们与现有策略进行了对比。
PatternLab提供了对各种特征选择和归一化策略的简便统一访问,每种策略都有其自身的适用范围。此外,还有绘图工具可帮助分析高通量实验数据。PatternLab可在http://pcarvalho.com/patternlab获取。