School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae330.
Mechanisms of protein-DNA interactions are involved in a wide range of biological activities and processes. Accurately identifying binding sites between proteins and DNA is crucial for analyzing genetic material, exploring protein functions, and designing novel drugs. In recent years, several computational methods have been proposed as alternatives to time-consuming and expensive traditional experiments. However, accurately predicting protein-DNA binding sites still remains a challenge. Existing computational methods often rely on handcrafted features and a single-model architecture, leaving room for improvement. We propose a novel computational method, called EGPDI, based on multi-view graph embedding fusion. This approach involves the integration of Equivariant Graph Neural Networks (EGNN) and Graph Convolutional Networks II (GCNII), independently configured to profoundly mine the global and local node embedding representations. An advanced gated multi-head attention mechanism is subsequently employed to capture the attention weights of the dual embedding representations, thereby facilitating the integration of node features. Besides, extra node features from protein language models are introduced to provide more structural information. To our knowledge, this is the first time that multi-view graph embedding fusion has been applied to the task of protein-DNA binding site prediction. The results of five-fold cross-validation and independent testing demonstrate that EGPDI outperforms state-of-the-art methods. Further comparative experiments and case studies also verify the superiority and generalization ability of EGPDI.
蛋白质 - DNA 相互作用的机制涉及广泛的生物活性和过程。准确识别蛋白质和 DNA 之间的结合位点对于分析遗传物质、探索蛋白质功能和设计新型药物至关重要。近年来,已经提出了几种计算方法来替代耗时且昂贵的传统实验。然而,准确预测蛋白质 - DNA 结合位点仍然是一个挑战。现有的计算方法通常依赖于手工制作的特征和单一模型架构,还有改进的空间。我们提出了一种名为 EGPDI 的新的计算方法,它基于多视图图嵌入融合。该方法涉及等变图神经网络 (EGNN) 和图卷积网络 II (GCNII) 的集成,它们分别进行配置,以深入挖掘全局和局部节点嵌入表示。然后,采用先进的门控多头注意机制来捕获双嵌入表示的注意力权重,从而促进节点特征的融合。此外,还引入了来自蛋白质语言模型的额外节点特征,以提供更多的结构信息。据我们所知,这是多视图图嵌入融合首次应用于蛋白质 - DNA 结合位点预测任务。五重交叉验证和独立测试的结果表明,EGPDI 优于最先进的方法。进一步的对比实验和案例研究也验证了 EGPDI 的优越性和泛化能力。