Wang Xiaohua, Zhang Ming, Yang Xibei, Yu Dong-Jun, Ge Fang
School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China.
School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China.
J Chem Inf Model. 2024 Dec 23;64(24):9626-9642. doi: 10.1021/acs.jcim.4c01999. Epub 2024 Nov 28.
Accurately predicting mutations in G protein-coupled receptors (GPCRs) is critical for advancing disease diagnosis and drug discovery. In response to this imperative, GPTrans has emerged as a highly accurate predictor of disease-related mutations in GPCRs. The core innovation of GPTrans resides in the design of a novel feature extraction network, that is capable of integrating features from both wildtype and mutant protein variant sites, utilizing multifeature connections within a transformer framework to ensure comprehensive feature extraction. A key aspect of GPTrans's effectiveness is our introduction of an innovative deep feature integration strategy, which merges embeddings and class tokens from multiple protein language models, including evolutionary scale modeling and ProtTrans, thus shedding light on the biochemical properties of proteins. Leveraging transformer components and a self-attention mechanism, GPTrans captures higher-level representations of protein features. Employing both wildtype and mutation site information for feature fusion not only enriches the predictive feature set but also avoids the common issue of overestimation associated with sequence-based predictions. This approach distinguishes GPTrans, enabling it to significantly outperform existing methods. Our evaluations across diverse GPCR data sets, including ClinVar and MutHTP, demonstrate GPTrans's superior performance, with average AUC values of 0.874 and 0.590 in 10-fold cross-validation. Notably, compared to the AlphaMissense method, GPTrans exhibited a remarkable 38.03% improvement in accuracy when predicting disease-associated mutations in the MutHTP data set. A thorough analysis of the predicted results further validates the model's effectiveness. The source code, data sets, and prediction results for GPTrans are available for academic use at https://github.com/EduardWang/GPTrans.
准确预测G蛋白偶联受体(GPCRs)中的突变对于推进疾病诊断和药物发现至关重要。为响应这一迫切需求,GPTrans已成为一种高度准确的GPCRs疾病相关突变预测工具。GPTrans的核心创新在于设计了一种新颖的特征提取网络,该网络能够整合野生型和突变型蛋白质变体位点的特征,利用Transformer框架内的多特征连接来确保全面的特征提取。GPTrans有效性的一个关键方面是我们引入了一种创新的深度特征整合策略,该策略融合了来自多个蛋白质语言模型(包括进化尺度建模和ProtTrans)的嵌入和类别令牌,从而揭示了蛋白质的生化特性。利用Transformer组件和自注意力机制,GPTrans捕获蛋白质特征的更高层次表示。使用野生型和突变位点信息进行特征融合不仅丰富了预测特征集,还避免了基于序列预测中常见的高估问题。这种方法使GPTrans与众不同,使其能够显著优于现有方法。我们在包括ClinVar和MutHTP在内的各种GPCR数据集上的评估证明了GPTrans的卓越性能,在10折交叉验证中的平均AUC值分别为0.874和0.590。值得注意 的是,与AlphaMissense方法相比,GPTrans在预测MutHTP数据集中的疾病相关突变时准确率提高了38.03%。对预测结果的深入分析进一步验证了该模型的有效性。GPTrans的源代码、数据集和预测结果可在https://github.com/EduardWang/GPTrans上供学术使用。