Eng Jimmy K, Fischer Bernd, Grossmann Jonas, Maccoss Michael J
Department of Genome Sciences, University of Washington, Seattle, Washington, USA.
J Proteome Res. 2008 Oct;7(10):4598-602. doi: 10.1021/pr800420s. Epub 2008 Sep 6.
The SEQUEST program was the first and remains one of the most widely used tools for assigning a peptide sequence within a database to a tandem mass spectrum. The cross correlation score is the primary score function implemented within SEQUEST and it is this score that makes the tool particularly sensitive. Unfortunately, this score is computationally expensive to calculate, and thus, to make the score manageable, SEQUEST uses a less sensitive but fast preliminary score and restricts the cross correlation to just the top 500 peptides returned by the preliminary score. Classically, the cross correlation score has been calculated using Fast Fourier Transforms (FFT) to generate the full correlation function. We describe an alternate method of calculating the cross correlation score that does not require FFTs and can be computed efficiently in a fraction of the time. The fast calculation allows all candidate peptides to be scored by the cross correlation function, potentially mitigating the need for the preliminary score, and enables an E-value significance calculation based on the cross correlation score distribution calculated on all candidate peptide sequences obtained from a sequence database.
SEQUEST程序是首个且至今仍是最广泛使用的工具之一,用于在数据库中将肽序列与串联质谱进行匹配。交叉相关分数是SEQUEST中实现的主要评分函数,正是这个分数使得该工具特别灵敏。不幸的是,计算这个分数的计算成本很高,因此,为了使分数易于管理,SEQUEST使用了一个不太灵敏但快速的初步分数,并将交叉相关限制在初步分数返回的前500个肽段上。传统上,交叉相关分数是使用快速傅里叶变换(FFT)来生成完整的相关函数进行计算的。我们描述了一种计算交叉相关分数的替代方法,该方法不需要FFT,并且可以在一小部分时间内高效计算。这种快速计算允许通过交叉相关函数对所有候选肽段进行评分,可能减少对初步分数的需求,并能够基于从序列数据库获得的所有候选肽序列计算出的交叉相关分数分布进行E值显著性计算。