Eriksson Jan, Fenyö David
Swedish University of Agricultural Sciences, Box 7015, SE-750 07 Uppsala, Sweden.
J Proteome Res. 2004 Sep-Oct;3(5):979-82. doi: 10.1021/pr0499343.
The potential for obtaining a true mass spectrometric protein identification result depends on the choice of algorithm as well as on experimental factors that influence the information content in the mass spectrometric data. Current methods can never prove definitively that a result is true, but an appropriate choice of algorithm can provide a measure of the statistical risk that a result is false, i.e., the statistical significance. We recently demonstrated an algorithm, Probity, which assigns the statistical significance to each result. For any choice of algorithm, the difficulty of obtaining statistically significant results depends on the number of protein sequences in the sequence collection searched. By simulations of random protein identifications and using the Probity algorithm, we here demonstrate explicitly how the statistical significance depends on the number of sequences searched. We also provide an example on how the practitioner's choice of taxonomic constraints influences the statistical significance.
获得真正的质谱蛋白质鉴定结果的可能性取决于算法的选择以及影响质谱数据信息含量的实验因素。目前的方法永远无法确凿地证明一个结果是真实的,但合适的算法选择可以提供一个衡量结果为假的统计风险的指标,即统计显著性。我们最近展示了一种算法,即Probity,它能为每个结果赋予统计显著性。对于任何算法选择,获得具有统计显著性结果的难度取决于所搜索序列集合中的蛋白质序列数量。通过随机蛋白质鉴定的模拟并使用Probity算法,我们在此明确展示了统计显著性如何取决于所搜索的序列数量。我们还提供了一个示例,说明从业者对分类学限制的选择如何影响统计显著性。