Almalki Bander, Liao Li
Department of Computer and Information Sciences, University of Delaware, Smith Hall, 18 Amstel Avenue, Newark, DE 19716, USA.
Int J Mol Sci. 2025 Apr 30;26(9):4270. doi: 10.3390/ijms26094270.
Most bitopic transmembrane proteins associate with one another through interface residues to form dimers, which facilitate or activate specific cellular functions. Therefore, accurately identifying interface residues in a given dimer is crucial for understanding its function and has been a challenging pursuit for many computational methods. These methods can be broadly categorized into two approaches: general-purpose ones for dimerization and specialized ones for interface residues. In this study, we develop a machine learning method that integrates both approaches by integrating sequential and structural features extracted from predicted structures and various domains. The results from cross-validation on a benchmark dataset show that our method, despite utilizing significantly fewer features, outperforms the state-of-the-art methods by more than three percentage points in performance, as measured by the F1 score. Furthermore, we evaluated the performance of the proposed model on a benchmark dataset as compared to the state-of-the-art multimeric structure predictors, including RoseTTAFold2, AlphaFold2Multimer, and PREDDIMER. The results show the superiority of the proposed model by outperforming all the other models, highlighting the effectiveness of integrating both structural and sequential features within the proposed framework.
大多数双拓扑跨膜蛋白通过界面残基相互结合形成二聚体,这些二聚体促进或激活特定的细胞功能。因此,准确识别给定二聚体中的界面残基对于理解其功能至关重要,并且一直是许多计算方法面临的挑战性任务。这些方法大致可分为两种:用于二聚化的通用方法和用于界面残基的专门方法。在本研究中,我们开发了一种机器学习方法,通过整合从预测结构和各种结构域中提取的序列和结构特征,将这两种方法结合起来。在一个基准数据集上的交叉验证结果表明,我们的方法尽管使用的特征显著较少,但以F1分数衡量,其性能比最先进的方法高出三个多百分点。此外,与包括RoseTTAFold2、AlphaFold2Multimer和PREDDIMER在内的最先进的多聚体结构预测器相比,我们在一个基准数据集上评估了所提出模型的性能。结果显示,所提出的模型优于所有其他模型,突出了在所提出的框架内整合结构和序列特征的有效性。