Suppr超能文献

SnapKin:一种用于从磷酸化蛋白质组学数据预测激酶-底物的深度学习集成快照方法。

SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data.

作者信息

Xiao Di, Lin Michael, Liu Chunlei, Geddes Thomas A, Burchfield James G, Parker Benjamin L, Humphrey Sean J, Yang Pengyi

机构信息

Computational Systems Biology Group, Children's Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia.

School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia.

出版信息

NAR Genom Bioinform. 2023 Nov 6;5(4):lqad099. doi: 10.1093/nargab/lqad099. eCollection 2023 Dec.

Abstract

A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a 'pseudo-positive' learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model ('SnapKin') by incorporating the above two learning strategies into a 'snapshot' ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin.

摘要

基于质谱的磷酸化蛋白质组学面临的一个主要挑战在于识别激酶的底物,因为目前鉴定出的底物中只有一小部分能够可靠地与已知激酶联系起来。机器学习技术是利用大规模磷酸化蛋白质组学数据通过计算预测激酶底物的很有前景的方法。然而,实验验证的激酶底物数量较少(真阳性)以及许多磷酸化蛋白质组学数据集中存在的高数据噪声共同限制了它们的适用性和实用性。在此,我们旨在开发先进的激酶 - 底物预测方法来应对这些挑战。使用七个大型磷酸化蛋白质组学数据集以及传统和深度学习模型的集合,我们首先证明一种用于缓解小样本量的“伪阳性”学习策略在提高模型预测性能方面是有效的。接下来,我们表明基于数据重采样的集成学习策略对于提高模型稳定性同时进一步增强预测是有用的。最后,我们通过将上述两种学习策略纳入“快照”集成学习算法中引入了一种集成深度学习模型(“SnapKin”)。我们提出了SnapKin,一种用于从大规模磷酸化蛋白质组学数据预测激酶底物的集成深度学习方法。我们证明SnapKin在激酶 - 底物预测方面始终优于现有方法。SnapKin可在https://github.com/PYangLab/SnapKin上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcd1/10632189/a0114392d28b/lqad099fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验