Suppr超能文献

SPOTONE:基于序列特征的极度随机化树的蛋白质复合物热点。

SPOTONE: Hot Spots on Protein Complexes with Extremely Randomized Trees via Sequence-Only Features.

机构信息

CNC-Center for Neuroscience and Cell Biology, University of Coimbra, 3004-504 Coimbra, Portugal.

Department of Life Sciences, Center for Neuroscience and Cell Biology, Coimbra University, 3000-456 Coimbra, Portugal.

出版信息

Int J Mol Sci. 2020 Oct 1;21(19):7281. doi: 10.3390/ijms21197281.

Abstract

Protein Hot-Spots (HS) are experimentally determined amino acids, key to small ligand binding and tend to be structural landmarks on protein-protein interactions. As such, they were extensively approached by structure-based Machine Learning (ML) prediction methods. However, the availability of a much larger array of protein sequences in comparison to determined tree-dimensional structures indicates that a sequence-based HS predictor has the potential to be more useful for the scientific community. Herein, we present SPOTONE, a new ML predictor able to accurately classify protein HS via sequence-only features. This algorithm shows accuracy, AUROC, precision, recall and F1-score of 0.82, 0.83, 0.91, 0.82 and 0.85, respectively, on an independent testing set. The algorithm is deployed within a free-to-use webserver at http://moreiralab.com/resources/spotone, only requiring the user to submit a FASTA file with one or more protein sequences.

摘要

蛋白质热点(HS)是经过实验确定的氨基酸,是小分子配体结合的关键,并且往往是蛋白质-蛋白质相互作用的结构标志。因此,它们被基于结构的机器学习(ML)预测方法广泛研究。然而,与已确定的三维结构相比,蛋白质序列的可用性要大得多,这表明基于序列的 HS 预测器有可能对科学界更有用。在此,我们介绍了 SPOTONE,这是一种新的 ML 预测器,能够仅通过序列特征准确地对蛋白质 HS 进行分类。该算法在独立测试集上的准确率、AUROC、精确率、召回率和 F1 得分为 0.82、0.83、0.91、0.82 和 0.85。该算法已在免费使用的网络服务器 http://moreiralab.com/resources/spotone 中部署,用户只需提交一个 FASTA 文件,其中包含一个或多个蛋白质序列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2ba/7582262/9827d8de3004/ijms-21-07281-g0A1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验