Tan Kuan Pern, Kanitkar Tejashree Rajaram, Kwoh Chee Keong, Madhusudhan Mallur Srivatsan
Bioinformatics Institute, Singapore, Singapore.
School of Computer Engineering, Nanyang Technological University, Singapore, Singapore.
Front Mol Biosci. 2021 Aug 20;8:646288. doi: 10.3389/fmolb.2021.646288. eCollection 2021.
Predicting the functional consequences of single point mutations has relevance to protein function annotation and to clinical analysis/diagnosis. We developed and tested Packpred that makes use of a multi-body clique statistical potential in combination with a depth-dependent amino acid substitution matrix (FADHM) and positional Shannon entropy to predict the functional consequences of point mutations in proteins. Parameters were trained over a saturation mutagenesis data set of T4-lysozyme (1,966 mutations). The method was tested over another saturation mutagenesis data set (CcdB; 1,534 mutations) and the Missense3D data set (4,099 mutations). The performance of Packpred was compared against those of six other contemporary methods. With MCC values of 0.42, 0.47, and 0.36 on the training and testing data sets, respectively, Packpred outperforms all methods in all data sets, with the exception of marginally underperforming in comparison to FADHM in the CcdB data set. A meta server analysis was performed that chose best performing methods of wild-type amino acids and for wild-type mutant amino acid pairs. This led to an increase in the MCC value of 0.40 and 0.51 for the two meta predictors, respectively, on the Missense3D data set. We conjecture that it is possible to improve accuracy with better meta predictors as among the seven methods compared, at least one method or another is able to correctly predict ∼99% of the data.
预测单点突变的功能后果与蛋白质功能注释以及临床分析/诊断相关。我们开发并测试了Packpred,它利用多体团统计势,结合深度依赖的氨基酸替换矩阵(FADHM)和位置香农熵来预测蛋白质中单个点突变的功能后果。参数是在T4溶菌酶的饱和诱变数据集(1966个突变)上进行训练的。该方法在另一个饱和诱变数据集(CcdB;1534个突变)和错义3D数据集(4099个突变)上进行了测试。将Packpred的性能与其他六种当代方法的性能进行了比较。在训练和测试数据集上,Packpred的马修斯相关系数(MCC)值分别为0.42、0.47和0.36,在所有数据集中均优于所有方法,但在CcdB数据集中与FADHM相比略逊一筹。进行了元服务器分析,该分析选择了野生型氨基酸和野生型-突变型氨基酸对的最佳性能方法。这使得两个元预测器在错义3D数据集上的MCC值分别提高到0.40和0.51。我们推测,通过更好的元预测器有可能提高准确性,因为在比较的七种方法中,至少有一种方法能够正确预测约99%的数据。