Suppr超能文献

一种基于代价敏感的在线学习方法用于肽段鉴定。

A cost-sensitive online learning method for peptide identification.

机构信息

College of Science, China University of Petroleum, Changjiang West Road, Qingdao, 266580, China.

School of Engineering and Applied Science, Western Kentucky University, Bowling Green, 42101, KY, USA.

出版信息

BMC Genomics. 2020 Apr 25;21(1):324. doi: 10.1186/s12864-020-6693-y.

Abstract

BACKGROUND

Post-database search is a key procedure in peptide identification with tandem mass spectrometry (MS/MS) strategies for refining peptide-spectrum matches (PSMs) generated by database search engines. Although many statistical and machine learning-based methods have been developed to improve the accuracy of peptide identification, the challenge remains on large-scale datasets and datasets with a distribution of unbalanced PSMs. A more efficient learning strategy is required for improving the accuracy of peptide identification on challenging datasets. While complex learning models have larger power of classification, they may cause overfitting problems and introduce computational complexity on large-scale datasets. Kernel methods map data from the sample space to high dimensional spaces where data relationships can be simplified for modeling.

RESULTS

In order to tackle the computational challenge of using the kernel-based learning model for practical peptide identification problems, we present an online learning algorithm, OLCS-Ranker, which iteratively feeds only one training sample into the learning model at each round, and, as a result, the memory requirement for computation is significantly reduced. Meanwhile, we propose a cost-sensitive learning model for OLCS-Ranker by using a larger loss of decoy PSMs than that of target PSMs in the loss function.

CONCLUSIONS

The new model can reduce its false discovery rate on datasets with a distribution of unbalanced PSMs. Experimental studies show that OLCS-Ranker outperforms other methods in terms of accuracy and stability, especially on datasets with a distribution of unbalanced PSMs. Furthermore, OLCS-Ranker is 15-85 times faster than CRanker.

摘要

背景

在采用串联质谱(MS/MS)策略进行肽鉴定时,数据库搜索后处理是一个关键步骤,旨在优化数据库搜索引擎生成的肽谱匹配(PSM)。尽管已经开发了许多基于统计和机器学习的方法来提高肽鉴定的准确性,但在大规模数据集和 PSM 分布不均衡的数据集中,仍然存在挑战。需要更有效的学习策略来提高困难数据集上肽鉴定的准确性。虽然复杂的学习模型具有更强的分类能力,但它们可能会导致过拟合问题,并在大规模数据集上引入计算复杂性。核方法将数据从样本空间映射到高维空间,在高维空间中可以简化数据关系进行建模。

结果

为了解决在实际肽鉴定问题中使用基于核的学习模型的计算挑战,我们提出了一种在线学习算法 OLCS-Ranker,该算法在每一轮迭代中仅将一个训练样本输入到学习模型中,从而大大减少了计算所需的内存。同时,我们在损失函数中使用诱饵 PSM 的损失大于目标 PSM 的损失,为 OLCS-Ranker 提出了一种代价敏感的学习模型。

结论

该新模型可以降低 PSM 分布不均衡数据集的假阳性率。实验研究表明,OLCS-Ranker 在准确性和稳定性方面优于其他方法,尤其是在 PSM 分布不均衡的数据集上。此外,OLCS-Ranker 比 CRanker 快 15-85 倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2be7/7183122/886963d96824/12864_2020_6693_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验