CNC-Center for Neuroscience and Cell Biology, University of Coimbra, 3004-504 Coimbra, Portugal.
Department of Life Sciences, Center for Neuroscience and Cell Biology, Coimbra University, 3000-456 Coimbra, Portugal.
Int J Mol Sci. 2020 Oct 1;21(19):7281. doi: 10.3390/ijms21197281.
Protein Hot-Spots (HS) are experimentally determined amino acids, key to small ligand binding and tend to be structural landmarks on protein-protein interactions. As such, they were extensively approached by structure-based Machine Learning (ML) prediction methods. However, the availability of a much larger array of protein sequences in comparison to determined tree-dimensional structures indicates that a sequence-based HS predictor has the potential to be more useful for the scientific community. Herein, we present SPOTONE, a new ML predictor able to accurately classify protein HS via sequence-only features. This algorithm shows accuracy, AUROC, precision, recall and F1-score of 0.82, 0.83, 0.91, 0.82 and 0.85, respectively, on an independent testing set. The algorithm is deployed within a free-to-use webserver at http://moreiralab.com/resources/spotone, only requiring the user to submit a FASTA file with one or more protein sequences.
蛋白质热点(HS)是经过实验确定的氨基酸,是小分子配体结合的关键,并且往往是蛋白质-蛋白质相互作用的结构标志。因此,它们被基于结构的机器学习(ML)预测方法广泛研究。然而,与已确定的三维结构相比,蛋白质序列的可用性要大得多,这表明基于序列的 HS 预测器有可能对科学界更有用。在此,我们介绍了 SPOTONE,这是一种新的 ML 预测器,能够仅通过序列特征准确地对蛋白质 HS 进行分类。该算法在独立测试集上的准确率、AUROC、精确率、召回率和 F1 得分为 0.82、0.83、0.91、0.82 和 0.85。该算法已在免费使用的网络服务器 http://moreiralab.com/resources/spotone 中部署,用户只需提交一个 FASTA 文件,其中包含一个或多个蛋白质序列。