Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India.
Laboratory of Biochemistry and Genetics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20892, USA.
BMC Bioinformatics. 2017 Dec 22;18(1):583. doi: 10.1186/s12859-017-1987-z.
Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc.
Both meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for ~73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization.
The benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/ .
催化残基的知识对于阐明酶的机制细节起着至关重要的作用。然而,催化残基的实验鉴定是一项繁琐且耗时的任务,可以通过计算预测来加速。尽管活性位点预测方法有了显著的发展,但仍存在一个问题,即假定催化残基在所有排序残基中的排序位置。为了提高催化残基的排序和预测准确性,我们开发了一种基于元方法的 CSmetaPred 方法。在这种方法中,根据从四个著名的催化残基预测器中得出的归一化残基得分的平均值对残基进行排序。CSmetaPred 的平均残基得分与预测口袋信息相结合,以提高元预测器 CSmetaPred_poc 的预测性能。
使用接收器操作特征 (ROC) 和精度召回 (PR) 曲线,在两个综合基准数据集和三个遗留数据集上评估了两个元预测器。ROC 和 PR 曲线的可视化和定量分析表明,元预测器优于其组成方法,CSmetaPred_poc 是评估方法中最好的。例如,在 CSAMAC 数据集上,CSmetaPred_poc(CSmetaPred)实现了最高的平均特异性 (MAS),ROC 曲线的标量度量,为 0.97(0.96)。重要的是,催化残基的预测排名中位数最低(最佳)为 CSmetaPred_poc。在二元分类中,将排名≤20 的残基视为真阳性,CSmetaPred_poc 在 CSAMAC 数据集上的预测准确率为 0.94。此外,在同一数据集上,CSmetaPred_poc 预测了约 73%的酶的前 20 位的所有催化残基。此外,基于比较建模结构的预测基准测试表明,模型的预测结果优于仅基于序列的预测。这些分析表明,CSmetaPred_poc 能够将假定的催化残基排在更低(更好)的位置,这可以促进和加快它们的实验表征。
基准研究表明,在组合来自著名催化残基预测器的残基水平得分时采用元方法可以提高预测准确性,并提供已知催化残基的改进排序位置。因此,这些预测可以帮助实验人员在努力表征催化残基时,为突变研究优先选择残基。两个元预测器都可以在以下网址作为网络服务器使用:http://14.139.227.206/csmetapred/ 。