Shen Changyu, Sheng Quanhu, Dai Jie, Li Yixue, Zeng Rong, Tang Haixu
Division of Biostatistics, Indiana University School of Medicine, Indianapolis, IN 46202 , USA.
Proteomics. 2009 Jan;9(1):194-204. doi: 10.1002/pmic.200800330.
False positive control/estimate in peptide identifications by MS is of critical importance for reliable inference at the protein level and downstream bioinformatics analysis. Approaches based on search against decoy databases have become popular for its conceptual simplicity and easy implementation. Although various decoy search strategies have been proposed, few studies have investigated their difference in performance. With datasets collected on a mixture of model proteins, we demonstrate that a single search against the target database coupled with its reversed version offers a good balance between performance and simplicity. In particular, both the accuracy of the estimate of the number of false positives and sensitivity is at least comparable to other procedures examined in this study. It is also shown that scrambling while preserving frequency of amino acid words can potentially improve the accuracy of false positive estimate, though more studies are needed to investigate the optimal scrambling procedure for specific condition and the variation of the estimate across repeated scrambling.
质谱肽段鉴定中的假阳性对照/估计对于蛋白质水平的可靠推断和下游生物信息学分析至关重要。基于对诱饵数据库进行搜索的方法因其概念简单和易于实现而变得流行。尽管已经提出了各种诱饵搜索策略,但很少有研究调查它们在性能上的差异。通过在模型蛋白质混合物上收集的数据集,我们证明对目标数据库及其反向版本进行单次搜索在性能和简单性之间提供了良好的平衡。特别是,假阳性数量估计的准确性和灵敏度至少与本研究中检查的其他程序相当。还表明,在保留氨基酸词频率的同时进行加扰可能会提高假阳性估计的准确性,不过需要更多研究来研究特定条件下的最佳加扰程序以及重复加扰时估计值的变化。