Suppr超能文献

基于机器学习的人类蛋白质泛素化位点预测方法。

Machine learning-based approaches for ubiquitination site prediction in human proteins.

机构信息

Department of Information Technology, Tarbiat Modares University, 14115-111, Tehran, Iran.

Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, 14115-111, Tehran, Iran.

出版信息

BMC Bioinformatics. 2023 Nov 28;24(1):449. doi: 10.1186/s12859-023-05581-w.

Abstract

Protein ubiquitination is a critical post-translational modification (PTMs) involved in numerous cellular processes. Identifying ubiquitination sites (Ubi-sites) on proteins offers valuable insights into their function and regulatory mechanisms. Due to the cost- and time-consuming nature of traditional approaches for Ubi-site detection, there has been a growing interest in leveraging artificial intelligence for computer-aided Ubi-site prediction. In this study, we collected experimentally verified Ubi-sites of human proteins from the dbPTM database, then conducted comprehensive state-of-the art computational methods along with standard evaluation metrics and a proper validation strategy for Ubi-site prediction. We presented the effectiveness of our framework by comparing ten machine learning (ML) based approaches in three different categories: feature-based conventional ML methods, end-to-end sequence-based deep learning (DL) techniques, and hybrid feature-based DL models. Our results revealed that DL approaches outperformed the classical ML methods, achieving a 0.902 F1-score, 0.8198 accuracy, 0.8786 precision, and 0.9147 recall as the best performance for a DL model using both raw amino acid sequences and hand-crafted features. Interestingly, our experimental results disclosed that the performance of DL methods had a positive correlation with the length of amino acid fragments, suggesting that utilizing the entire sequence can lead to more accurate predictions in future research endeavors. Additionally, we developed a meticulously curated benchmark for Ubi-site prediction in human proteins. This benchmark serves as a valuable resource for future studies, enabling fair and accurate comparisons between different methods. Overall, our work highlights the potential of ML, particularly DL techniques, in predicting Ubi-sites and furthering our knowledge of protein regulation through ubiquitination in cells.

摘要

蛋白质泛素化是一种重要的翻译后修饰(PTMs),参与了许多细胞过程。鉴定蛋白质上的泛素化位点(Ubi-sites)可以深入了解其功能和调节机制。由于传统的泛素化位点检测方法成本高、耗时,因此人们越来越感兴趣地利用人工智能进行计算机辅助的泛素化位点预测。在这项研究中,我们从 dbPTM 数据库中收集了经过实验验证的人类蛋白质的泛素化位点,然后针对泛素化位点预测,结合标准评估指标和适当的验证策略,进行了全面的最新计算方法研究。我们通过在三个不同类别中比较十种基于机器学习(ML)的方法,展示了我们框架的有效性:基于特征的传统 ML 方法、端到端基于序列的深度学习(DL)技术以及基于特征的混合 DL 模型。我们的结果表明,DL 方法优于经典的 ML 方法,在使用原始氨基酸序列和手工制作特征的情况下,DL 模型的最佳性能达到了 0.902 的 F1 分数、0.8198 的准确性、0.8786 的精度和 0.9147 的召回率。有趣的是,我们的实验结果表明,DL 方法的性能与氨基酸片段的长度呈正相关,这表明在未来的研究中,利用整个序列可以进行更准确的预测。此外,我们还开发了一个精心策划的人类蛋白质泛素化位点预测基准。这个基准是未来研究的宝贵资源,能够在不同方法之间进行公平和准确的比较。总的来说,我们的工作强调了机器学习,特别是 DL 技术在预测泛素化位点方面的潜力,通过泛素化进一步了解细胞中蛋白质的调控。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e38a/10683244/9d7116a33433/12859_2023_5581_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验