Department of Informatics, Federal University of Viçosa, 36570-000 Minas Geras, Brazil.
BMC Genomics. 2012;13 Suppl 5(Suppl 5):S4. doi: 10.1186/1471-2164-13-S5-S4. Epub 2012 Oct 19.
The shotgun strategy (liquid chromatography coupled with tandem mass spectrometry) is widely applied for identification of proteins in complex mixtures. This method gives rise to thousands of spectra in a single run, which are interpreted by computational tools. Such tools normally use a protein database from which peptide sequences are extracted for matching with experimentally derived mass spectral data. After the database search, the correctness of obtained peptide-spectrum matches (PSMs) needs to be evaluated also by algorithms, as a manual curation of these huge datasets would be impractical. The target-decoy database strategy is largely used to perform spectrum evaluation. Nonetheless, this method has been applied without considering sensitivity, i.e., only error estimation is taken into account. A recently proposed method termed MUDE treats the target-decoy analysis as an optimization problem, where sensitivity is maximized. This method demonstrates a significant increase in the retrieved number of PSMs for a fixed error rate. However, the MUDE model is constructed in such a way that linear decision boundaries are established to separate correct from incorrect PSMs. Besides, the described heuristic for solving the optimization problem has to be executed many times to achieve a significant augmentation in sensitivity.
Here, we propose a new method, termed MUMAL, for PSM assessment that is based on machine learning techniques. Our method can establish nonlinear decision boundaries, leading to a higher chance to retrieve more true positives. Furthermore, we need few iterations to achieve high sensitivities, strikingly shortening the running time of the whole process. Experiments show that our method achieves a considerably higher number of PSMs compared with standard tools such as MUDE, PeptideProphet, and typical target-decoy approaches.
Our approach not only enhances the computational performance, and thus the turn around time of MS-based experiments in proteomics, but also improves the information content with benefits of a higher proteome coverage. This improvement, for instance, increases the chance to identify important drug targets or biomarkers for drug development or molecular diagnostics.
shotgun 策略(液相色谱与串联质谱联用)广泛应用于复杂混合物中蛋白质的鉴定。该方法在单次运行中产生数千个谱图,由计算工具进行解释。这些工具通常使用蛋白质数据库,从数据库中提取肽序列与实验衍生的质谱数据进行匹配。数据库搜索后,还需要通过算法评估获得的肽谱匹配(PSM)的正确性,因为对这些庞大数据集进行人工审核是不切实际的。靶标-诱饵数据库策略主要用于进行谱评估。然而,这种方法在应用时没有考虑到灵敏度,即仅考虑错误估计。最近提出的 MUDE 方法将靶标-诱饵分析视为优化问题,其中灵敏度最大化。该方法在固定错误率下显著增加了检索到的 PSM 数量。然而,MUDE 模型的构建方式是建立线性决策边界来区分正确和错误的 PSM。此外,为了解决优化问题,所描述的启发式方法需要多次执行才能显著提高灵敏度。
在这里,我们提出了一种新的 PSM 评估方法,称为 MUMAL,该方法基于机器学习技术。我们的方法可以建立非线性决策边界,从而提高检索更多真阳性的机会。此外,我们需要很少的迭代就能达到很高的灵敏度,显著缩短整个过程的运行时间。实验表明,与 MUDE、PeptideProphet 和典型的靶标-诱饵方法等标准工具相比,我们的方法可以获得更多的 PSM。
我们的方法不仅提高了计算性能,从而缩短了基于 MS 的蛋白质组学实验的周转时间,而且还通过提高蛋白质组覆盖率来提高信息含量。这种改进,例如,增加了识别重要药物靶点或生物标志物的机会,用于药物开发或分子诊断。