PrecisionLife Ltd., Long Hanborough, OX29 8LJ Oxford, UK.
Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK.
Bioinformatics. 2021 May 23;37(8):1099-1106. doi: 10.1093/bioinformatics/btaa937.
Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams).
FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites.
https://github.com/UCL/cath-funsite-predictor.
Supplementary data are available at Bioinformatics online.
在蛋白质中识别功能位点对于功能表征、变体解释和药物设计至关重要。有几种方法可用于预测一般功能位点或特定类型的功能位点。在这里,我们提出了 FunSite,这是一种机器学习预测器,它使用源自蛋白质序列和结构以及 CATH 功能家族(FunFams)的进化数据的特征来识别催化、配体结合和蛋白-蛋白相互作用功能位点。
我们使用交叉验证和保留数据集对 FunSite 的预测性能进行了严格的基准测试。FunSite 优于其他公开可用的功能位点预测方法。我们表明,FunFams 中的保守残基在功能位点中富集。我们发现 FunSite 的性能在很大程度上取决于功能位点注释的质量和训练数据中 FunFams 的信息量。最后,我们分析了哪些结构和进化特征对功能位点最具预测性。
https://github.com/UCL/cath-funsite-predictor。
补充数据可在《生物信息学》在线获得。