Sadygov Rovshan G, Eng Jimmy, Durr Eberhard, Saraf Anita, McDonald Hayes, MacCoss Michael J, Yates John R
Department of Cell Biology, SR-25, The Scripps Research Institute, North Torrey Pines Road, La Jolla, California 92037, USA.
J Proteome Res. 2002 May-Jun;1(3):211-5. doi: 10.1021/pr015514r.
We report the results of our work to facilitate protein identification using tandem mass spectra and protein sequence databases. We describe a parallel version of SEQUEST (SEQUEST-PVM) that is tolerant toward arithmetic exceptions. The changes we report effectively separate search processes on slave nodes from each other. Therefore, if one of the slave nodes drops out of the cluster due to an error, the rest of the cluster will carry the search process to the end. SEQUEST has been widely used for protein identifications. The modifications made to the code improve its stability and effectiveness in a high-throughput production environment. We evaluate the overhead associated with the parallelization of SEQUEST. A prior version of software to preprocess LC/MS/MS data attempted to differentiate the charge states of ions. Singly charged ions can be accurately identified, but the software was unable to reliably differentiate tandem mass spectra of +2 and +3 charge states. We have designed and implemented a computational approach to narrow charge states of precursor ions from nominal resolution ion-trap tandem mass spectra. The preprocessing code, 2to3, determines the charge state of the precursor ion using its mass-to-charge ratio (m/z) and fragment ions contained in the tandem mass spectrum. For each possible charge state the program calculates the expected fragment ions that account for precursor ion m/z values. If any one of the numbers is less than an empirically determined threshold value then the spectrum corresponding to that charge state is removed. If both numbers are higher than the threshold value then +2 and +3 copies of the spectrum are kept. We present the comparison of results from protein identification experiments with and without using 2 to 3. It is shown that by determining the charge state and eliminating poor quality spectra 2to3 decreases the number of spectral files to be searched without affecting the search results. The decrease reduces computer requirements and researcher efforts for analysis of the results.
我们报告了为便于利用串联质谱和蛋白质序列数据库进行蛋白质鉴定而开展的工作成果。我们描述了一种对算术异常具有耐受性的SEQUEST并行版本(SEQUEST-PVM)。我们所报告的这些更改有效地将从属节点上的搜索过程相互分离。因此,如果一个从属节点因错误而退出集群,集群中的其他节点将把搜索过程执行到底。SEQUEST已被广泛用于蛋白质鉴定。对代码所做的修改提高了其在高通量生产环境中的稳定性和有效性。我们评估了与SEQUEST并行化相关的开销。一个用于预处理液相色谱/串联质谱数据的早期软件版本试图区分离子的电荷状态。单电荷离子能够被准确识别,但该软件无法可靠地区分+2和+3电荷状态的串联质谱。我们设计并实现了一种计算方法,用于从标称分辨率的离子阱串联质谱中缩小前体离子的电荷状态范围。预处理代码2to3利用前体离子的质荷比(m/z)和串联质谱中包含的碎片离子来确定前体离子的电荷状态。对于每个可能的电荷状态,该程序会计算出能够解释前体离子m/z值的预期碎片离子。如果其中任何一个数值小于根据经验确定的阈值,则去除对应该电荷状态的质谱。如果两个数值都高于阈值,则保留该质谱的+2和+3副本。我们展示了使用2to3和不使用2to3的蛋白质鉴定实验结果的比较。结果表明,通过确定电荷状态并去除质量较差的质谱,2to3减少了待搜索的质谱文件数量,同时不影响搜索结果。这种减少降低了计算机需求以及研究人员分析结果的工作量。