Farheen Nida, Sen Neeladri, Nair Sanjana, Tan Kuan Pern, Madhusudhan M S
Indian Institute of Science Education and Research, Pune 411008, India.
Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671; School of Computer Engineering, Nanyang Technological University, Singapore 639798.
Prog Biophys Mol Biol. 2017 Sep;128:14-23. doi: 10.1016/j.pbiomolbio.2017.02.004. Epub 2017 Feb 15.
The 20 naturally occurring amino acids have different environmental preferences of where they are likely to occur in protein structures. Environments in a protein can be classified by their proximity to solvent by the residue depth measure. Since the frequencies of amino acids are different at various depth levels, the substitution frequencies should vary according to depth. To quantify these substitution frequencies, we built depth dependent substitution matrices. The dataset used for creation of the matrices consisted of 3696 high quality, non redundant pairwise protein structural alignments. One of the applications of these matrices is to predict the tolerance of mutations in different protein environments. Using these substitution scores the prediction of deleterious mutations was done on 3500 mutations in T4 lysozyme and CcdB. The accuracy of the technique in terms of the Matthews Correlation Coefficient (MCC) is 0.48 on the CcdB testing set, while the best of the other tested methods has an MCC of 0.40. Further developments in these substitution matrices could help in improving structure-sequence alignment for protein 3D structure modeling.
20种天然存在的氨基酸在蛋白质结构中可能出现的位置具有不同的环境偏好。蛋白质中的环境可以通过残基深度测量来根据其与溶剂的接近程度进行分类。由于氨基酸在不同深度水平的频率不同,取代频率应根据深度而变化。为了量化这些取代频率,我们构建了依赖于深度的取代矩阵。用于创建这些矩阵的数据集由3696个高质量、非冗余的成对蛋白质结构比对组成。这些矩阵的应用之一是预测不同蛋白质环境中突变的耐受性。利用这些取代分数,对T4溶菌酶和CcdB中的3500个突变进行了有害突变预测。在CcdB测试集上,该技术在马修斯相关系数(MCC)方面的准确率为0.48,而其他测试方法中最好的MCC为0.40。这些取代矩阵的进一步发展有助于改进蛋白质三维结构建模的结构-序列比对。