Ge Fang, Zhu Yi-Heng, Xu Jian, Muhammad Arif, Song Jiangning, Yu Dong-Jun
School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China.
School of Systems and Technology, Department of Informatics and System, University of Management and Technology, Lahore, 54770, Pakistan.
Comput Struct Biotechnol J. 2021 Nov 19;19:6400-6416. doi: 10.1016/j.csbj.2021.11.024. eCollection 2021.
Transmembrane proteins have critical biological functions and play a role in a multitude of cellular processes including cell signaling, transport of molecules and ions across membranes. Approximately 60% of transmembrane proteins are considered as drug targets. Missense mutations in such proteins can lead to many diverse diseases and disorders, such as neurodegenerative diseases and cystic fibrosis. However, there are limited studies on mutations in transmembrane proteins. In this work, we first design a new feature encoding method, termed weight attenuation position-specific scoring matrix (WAPSSM), which builds upon the protein evolutionary information. Then, we propose a new mutation prediction algorithm (cascade XGBoost) by leveraging the idea learned from consensus predictors and gcForest. Multi-level experiments illustrate the effectiveness of WAPSSM and cascade XGBoost algorithms. Finally, based on WAPSSM and other three types of features, in combination with the cascade XGBoost algorithm, we develop a new transmembrane protein mutation predictor, named MutTMPredictor. We benchmark the performance of MutTMPredictor against several existing predictors on seven datasets. On the 546 mutations dataset, MutTMPredictor achieves the accuracy () of 0.9661 and the Matthew's Correlation Coefficient () of 0.8950. While on the 67,584 dataset, MutTMPredictor achieves an of 0.7523 and area under curve () of 0.8746, which are 0.1625 and 0.0801 respectively higher than those of the existing best predictor (fathmm). Besides, MutTMPredictor also outperforms two specific predictors on the Pred-MutHTP datasets. The results suggest that MutTMPredictor can be used as an effective method for predicting and prioritizing missense mutations in transmembrane proteins. The MutTMPredictor webserver and datasets are freely accessible at http://csbio.njust.edu.cn/bioinf/muttmpredictor/ for academic use.
跨膜蛋白具有关键的生物学功能,在众多细胞过程中发挥作用,包括细胞信号传导、分子和离子跨膜运输。大约60%的跨膜蛋白被视为药物靶点。此类蛋白中的错义突变可导致多种不同的疾病和病症,如神经退行性疾病和囊性纤维化。然而,关于跨膜蛋白突变的研究有限。在这项工作中,我们首先设计了一种新的特征编码方法,称为权重衰减位置特异性评分矩阵(WAPSSM),它基于蛋白质进化信息构建。然后,我们通过利用从共识预测器和gcForest中学到的思想,提出了一种新的突变预测算法(级联XGBoost)。多层次实验说明了WAPSSM和级联XGBoost算法的有效性。最后,基于WAPSSM和其他三种类型的特征,结合级联XGBoost算法,我们开发了一种新的跨膜蛋白突变预测器,名为MutTMPredictor。我们在七个数据集上针对几种现有预测器对MutTMPredictor的性能进行了基准测试。在546个突变数据集上,MutTMPredictor的准确率()为0.9661,马修斯相关系数()为0.8950。而在67584数据集上,MutTMPredictor的得分为0.7523,曲线下面积()为0.8746,分别比现有最佳预测器(fathmm)高0.1625和0.0801。此外,MutTMPredictor在Pred-MutHTP数据集上也优于两个特定的预测器。结果表明,MutTMPredictor可作为预测跨膜蛋白错义突变并对其进行优先级排序的有效方法。MutTMPredictor网络服务器和数据集可在http://csbio.njust.edu.cn/bioinf/muttmpredictor/免费获取以供学术使用。