Sadygov Rovshan G, Cociorva Daniel, Yates John R
Department of Cell Biology, The Scripps Research Institute, La Jolla, California 92037, USA.
Nat Methods. 2004 Dec;1(3):195-202. doi: 10.1038/nmeth725.
Database searching is an essential element of large-scale proteomics. Because these methods are widely used, it is important to understand the rationale of the algorithms. Most algorithms are based on concepts first developed in SEQUEST and PeptideSearch. Four basic approaches are used to determine a match between a spectrum and sequence: descriptive, interpretative, stochastic and probability-based matching. We review the basic concepts used by most search algorithms, the computational modeling of peptide identification and current challenges and limitations of this approach for protein identification.
数据库搜索是大规模蛋白质组学的一个基本要素。由于这些方法被广泛使用,了解算法的基本原理很重要。大多数算法基于最初在SEQUEST和PeptideSearch中开发的概念。有四种基本方法用于确定质谱图与序列之间的匹配:描述性、解释性、随机和基于概率的匹配。我们回顾了大多数搜索算法使用的基本概念、肽段鉴定的计算模型以及这种蛋白质鉴定方法当前面临的挑战和局限性。