Heller Manfred, Ye Mingliang, Michel Philippe E, Morier Patrick, Stalder Daniel, Jünger Martin A, Aebersold Ruedi, Reymond Frédéric, Rossier Joël S
DiagnoSwiss SA, Monthey, Switzerland.
J Proteome Res. 2005 Nov-Dec;4(6):2273-82. doi: 10.1021/pr050193v.
A very popular approach in proteomics is the so-called "shotgun LC-MS/MS" strategy. In its mostly used form, a total protein digest is separated by ion exchange fractionation in the first dimension followed by off- or on-line RP LC-MS/MS. We replaced the first dimension by isoelectric focusing in the liquid phase using the Off-Gel device producing 15 fractions. As peptides are separated by their isoelectric point in the first dimension and hydrophobicity in the second, those experimentally derived parameters (pI and R(T)) can be used for the validation of potentially identified peptides. We applied this strategy to a cellular extract of Drosophila Kc167 cells and identified peptides with two different database search engines, namely PHENYX and SEQUEST, with PeptideProphet validation of the SEQUEST results. PHENYX returned 7582 potential peptide identifications and SEQUEST 7629. The SEQUEST results were reduced to 2006 identifications by validation with PeptideProphet. Validation of the PeptideProphet, SEQUEST and PHENYX results by pI and R(T) parameters confirmed 1837 PeptideProphet identifications while in the remainder of the SEQUEST results another 1130 peptides were found to be likely hits. The validation on PHENYX resulted in the fixation of a solid p-value threshold of <1 x 10(-04) that sets by itself the correct identification confidence to >95%, and a final count of 2034 highly confident peptide identifications was achieved after pI and R(T) validation. Although the PeptideProphet and PHENYX datasets have a very high confidence the overlap of common identifications was only at 79.4%, to be explained by the fact that data interpretation was done searching different protein databases with two search engines of different algorithms. The approach used in this study allowed for an automated and improved data validation process for shotgun proteomics projects producing MS/MS peptide identification results of very high confidence.
蛋白质组学中一种非常流行的方法是所谓的“鸟枪法液相色谱-串联质谱”策略。在其最常用的形式中,首先通过离子交换分级分离总蛋白消化产物,然后进行离线或在线反相液相色谱-串联质谱分析。我们使用产生15个馏分的Off-Gel装置,通过液相中的等电聚焦取代了第一步。由于肽在第一维中按等电点分离,在第二维中按疏水性分离,这些实验得出的参数(pI和R(T))可用于验证潜在鉴定出的肽。我们将此策略应用于果蝇Kc167细胞的细胞提取物,并使用两种不同的数据库搜索引擎(即PHENYX和SEQUEST)鉴定肽,同时对SEQUEST结果进行PeptideProphet验证。PHENYX返回7582个潜在的肽鉴定结果,SEQUEST返回7629个。通过PeptideProphet验证后,SEQUEST结果减少到2006个鉴定结果。通过pI和R(T)参数对PeptideProphet、SEQUEST和PHENYX结果进行验证,确认了1837个PeptideProphet鉴定结果,而在SEQUEST结果的其余部分又发现了另外1130个可能的命中肽段。对PHENYX的验证导致确定了一个<1×10(-04)的可靠p值阈值,该阈值本身将正确鉴定的置信度设定为>95%,并且在pI和R(T)验证后最终获得了2034个高度可靠的肽鉴定结果。尽管PeptideProphet和PHENYX数据集具有非常高的置信度,但共同鉴定结果的重叠率仅为79.4%,这可以通过使用两种不同算法的搜索引擎在不同蛋白质数据库中进行数据解释这一事实来解释。本研究中使用的方法允许对鸟枪法蛋白质组学项目进行自动化和改进的数据验证过程,从而产生具有非常高置信度的串联质谱肽鉴定结果。