Suppr超能文献

基于进化特征预测抗冻蛋白

Prediction of Anti-Freezing Proteins From Their Evolutionary Profile.

作者信息

Kumar Nishant, Choudhury Shubham, Bajiya Nisha, Patiyal Sumeet, Raghava Gajendra P S

机构信息

Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.

Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA.

出版信息

Proteomics. 2025 Feb;25(3):e202400157. doi: 10.1002/pmic.202400157. Epub 2024 Sep 20.

Abstract

Prediction of antifreeze proteins (AFPs) holds significant importance due to their diverse applications in healthcare. An inherent limitation of current AFP prediction methods is their reliance on unreviewed proteins for evaluation. This study evaluates, proposed and existing methods on an independent dataset containing 80 AFPs and 73 non-AFPs obtained from Uniport, which have been already reviewed by experts. Initially, we constructed machine learning models for AFP prediction using selected composition-based protein features and achieved a peak AUROC of 0.90 with an MCC of 0.69 on the independent dataset. Subsequently, we observed a notable enhancement in model performance, with the AUROC increasing from 0.90 to 0.93 upon incorporating evolutionary information instead of relying solely on the primary sequence of proteins. Furthermore, we explored hybrid models integrating our machine learning approaches with BLAST-based similarity and motif-based methods. However, the performance of these hybrid models either matched or was inferior to that of our best machine-learning model. Our best model based on evolutionary information outperforms all existing methods on independent/validation dataset. To facilitate users, a user-friendly web server with a standalone package named "AFPropred" was developed (https://webs.iiitd.edu.in/raghava/afpropred).

摘要

由于抗冻蛋白(AFPs)在医疗保健领域的多种应用,其预测具有重要意义。当前AFP预测方法的一个固有局限性是它们在评估时依赖未经审核的蛋白质。本研究在一个独立数据集上评估了已有的和新提出的方法,该数据集包含从UniPort获取的80种AFP和73种非AFP,这些蛋白质已经过专家审核。最初,我们使用选定的基于组成的蛋白质特征构建了用于AFP预测的机器学习模型,在独立数据集上实现了0.90的峰值曲线下面积(AUROC)和0.69的马修斯相关系数(MCC)。随后,我们观察到模型性能有显著提升,在纳入进化信息而非仅依赖蛋白质的一级序列后,AUROC从0.90增加到0.93。此外,我们探索了将机器学习方法与基于BLAST的相似性和基于基序的方法相结合的混合模型。然而,这些混合模型的性能要么与我们最好的机器学习模型相当,要么不如它。我们基于进化信息的最佳模型在独立/验证数据集上优于所有现有方法。为方便用户,开发了一个名为“AFPropred”的独立软件包的用户友好型网络服务器(https://webs.iiitd.edu.in/raghava/afpropred)。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验