通过串联质谱法从大型数据库中产生假阳性鉴定结果的可能性。

Potential for false positive identifications from large databases through tandem mass spectrometry.

作者信息

Cargile Benjamin J, Bundy Jonathan L, Stephenson James L

机构信息

Mass Spectrometry Research Program, Research Triangle Institute, 3040 Cornwallis Road, Research Triangle Park, North Carolina 27709, USA.

出版信息

J Proteome Res. 2004 Sep-Oct;3(5):1082-5. doi: 10.1021/pr049946o.

DOI:10.1021/pr049946o

PMID:15473699

Abstract

The biomedical research community at large is increasingly employing shotgun proteomics for large-scale identification of proteins from enzymatic digests. Typically, the approach used to identify proteins and peptides from tandem mass spectral data is based on the matching of experimentally generated tandem mass spectra to the theoretical best match from a protein database. Here, we present the potential difficulties of using such an approach without statistical consideration of the false positive rate, especially when large databases, as are encountered in eukaryotes are considered. This is illustrated by searching a dataset generated from a multidimensional separation of a eukaryotic tryptic digest against an in silico generated random protein database, which generated a significant number of positive matches, even when previously suggested score filtering criteria are used.

摘要

整个生物医学研究领域越来越多地采用鸟枪法蛋白质组学来大规模鉴定酶解产物中的蛋白质。通常，从串联质谱数据中鉴定蛋白质和肽段的方法是基于将实验生成的串联质谱与蛋白质数据库中理论上的最佳匹配进行比对。在此，我们提出了在不考虑假阳性率统计的情况下使用这种方法可能存在的困难，尤其是在考虑真核生物中遇到的大型数据库时。通过将一个真核生物胰蛋白酶酶解产物多维分离生成的数据集与一个计算机生成的随机蛋白质数据库进行搜索来说明这一点，即使使用了先前建议的得分过滤标准，该搜索仍产生了大量的阳性匹配结果。