EGPDI：基于多视图图嵌入融合的蛋白质-DNA 结合位点识别。

EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion.

机构信息

School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China.

出版信息

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae330.

DOI:10.1093/bib/bbae330

PMID:38975896

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11229037/

Abstract

Mechanisms of protein-DNA interactions are involved in a wide range of biological activities and processes. Accurately identifying binding sites between proteins and DNA is crucial for analyzing genetic material, exploring protein functions, and designing novel drugs. In recent years, several computational methods have been proposed as alternatives to time-consuming and expensive traditional experiments. However, accurately predicting protein-DNA binding sites still remains a challenge. Existing computational methods often rely on handcrafted features and a single-model architecture, leaving room for improvement. We propose a novel computational method, called EGPDI, based on multi-view graph embedding fusion. This approach involves the integration of Equivariant Graph Neural Networks (EGNN) and Graph Convolutional Networks II (GCNII), independently configured to profoundly mine the global and local node embedding representations. An advanced gated multi-head attention mechanism is subsequently employed to capture the attention weights of the dual embedding representations, thereby facilitating the integration of node features. Besides, extra node features from protein language models are introduced to provide more structural information. To our knowledge, this is the first time that multi-view graph embedding fusion has been applied to the task of protein-DNA binding site prediction. The results of five-fold cross-validation and independent testing demonstrate that EGPDI outperforms state-of-the-art methods. Further comparative experiments and case studies also verify the superiority and generalization ability of EGPDI.

摘要

蛋白质 - DNA 相互作用的机制涉及广泛的生物活性和过程。准确识别蛋白质和 DNA 之间的结合位点对于分析遗传物质、探索蛋白质功能和设计新型药物至关重要。近年来，已经提出了几种计算方法来替代耗时且昂贵的传统实验。然而，准确预测蛋白质 - DNA 结合位点仍然是一个挑战。现有的计算方法通常依赖于手工制作的特征和单一模型架构，还有改进的空间。我们提出了一种名为 EGPDI 的新的计算方法，它基于多视图图嵌入融合。该方法涉及等变图神经网络 (EGNN) 和图卷积网络 II (GCNII) 的集成，它们分别进行配置，以深入挖掘全局和局部节点嵌入表示。然后，采用先进的门控多头注意机制来捕获双嵌入表示的注意力权重，从而促进节点特征的融合。此外，还引入了来自蛋白质语言模型的额外节点特征，以提供更多的结构信息。据我们所知，这是多视图图嵌入融合首次应用于蛋白质 - DNA 结合位点预测任务。五重交叉验证和独立测试的结果表明，EGPDI 优于最先进的方法。进一步的对比实验和案例研究也验证了 EGPDI 的优越性和泛化能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73a9/11229037/2818691a454c/bbae330f1.jpg

相似文献

EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion.

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae330.

Graph embedding-based novel protein interaction prediction via higher-order graph convolutional network.

PLoS One. 2020 Sep 24;15(9):e0238915. doi: 10.1371/journal.pone.0238915. eCollection 2020.

MVGCNMDA: Multi-view Graph Augmentation Convolutional Network for Uncovering Disease-Related Microbes.

Interdiscip Sci. 2022 Sep;14(3):669-682. doi: 10.1007/s12539-022-00514-2. Epub 2022 Apr 15.

MGAT: Multi-view Graph Attention Networks.

Neural Netw. 2020 Dec;132:180-189. doi: 10.1016/j.neunet.2020.08.021. Epub 2020 Aug 27.

Exploring potential circRNA biomarkers for cancers based on double-line heterogeneous graph representation learning.

BMC Med Inform Decis Mak. 2024 Jun 6;24(1):159. doi: 10.1186/s12911-024-02564-6.

MEG-PPIS: a fast protein-protein interaction site prediction method based on multi-scale graph information and equivariant graph neural network.

Bioinformatics. 2024 Jan 5;40(5). doi: 10.1093/bioinformatics/btae269.

MAMF-GCN: Multi-scale adaptive multi-channel fusion deep graph convolutional network for predicting mental disorder.

Comput Biol Med. 2022 Sep;148:105823. doi: 10.1016/j.compbiomed.2022.105823. Epub 2022 Jul 6.

GNNGL-PPI: multi-category prediction of protein-protein interactions using graph neural networks based on global graphs and local subgraphs.

BMC Genomics. 2024 May 9;25(1):406. doi: 10.1186/s12864-024-10299-x.

DSSGNN-PPI: A Protein-Protein Interactions prediction model based on Double Structure and Sequence graph neural networks.

Comput Biol Med. 2024 Jul;177:108669. doi: 10.1016/j.compbiomed.2024.108669. Epub 2024 May 29.

Prediction of circRNA-Disease Associations Based on the Combination of Multi-Head Graph Attention Network and Graph Convolutional Network.

Biomolecules. 2022 Jul 2;12(7):932. doi: 10.3390/biom12070932.

引用本文的文献

Predicting nucleic acid binding sites by attention map-guided graph convolutional network with protein language embeddings and physicochemical information.

Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf457.

Advances in Language-Model-Informed Protein-Nucleic Acid Binding Site Prediction.

Methods Mol Biol. 2025;2941:139-151. doi: 10.1007/978-1-0716-4623-6_9.

A new strategy for Cas protein recognition based on graph neural networks and SMILES encoding.

Sci Rep. 2025 Apr 30;15(1):15236. doi: 10.1038/s41598-025-99999-2.

Advances in the Application of Protein Language Modeling for Nucleic Acid Protein Binding Site Prediction.

Genes (Basel). 2024 Aug 18;15(8):1090. doi: 10.3390/genes15081090.

本文引用的文献

EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks.

Nucleic Acids Res. 2024 Mar 21;52(5):e27. doi: 10.1093/nar/gkae039.

Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures.

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad360.

Structure-based prediction of nucleic acid binding residues by merging deep learning- and template-based approaches.

PLoS Comput Biol. 2023 Sep 6;19(9):e1011428. doi: 10.1371/journal.pcbi.1011428. eCollection 2023 Sep.

Evolutionary-scale prediction of atomic-level protein structure with a language model.

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

Single-sequence protein structure prediction using a language model and deep learning.

Nat Biotechnol. 2022 Nov;40(11):1617-1623. doi: 10.1038/s41587-022-01432-w. Epub 2022 Oct 3.

Genome-wide protein-DNA interaction site mapping in bacteria using a double-stranded DNA-specific cytosine deaminase.

Nat Microbiol. 2022 Jun;7(6):844-855. doi: 10.1038/s41564-022-01133-9. Epub 2022 Jun 1.

Cryo-EM structure of DNA-bound Smc5/6 reveals DNA clamping enabled by multi-subunit conformational changes.

Proc Natl Acad Sci U S A. 2022 Jun 7;119(23):e2202799119. doi: 10.1073/pnas.2202799119. Epub 2022 Jun 1.

AlphaFold2-aware protein-DNA binding site prediction using graph transformer.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab564.

ProteinBERT: a universal deep-learning model of protein sequence and function.

Bioinformatics. 2022 Apr 12;38(8):2102-2110. doi: 10.1093/bioinformatics/btac020.

Highly accurate protein structure prediction with AlphaFold.

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

EGPDI：基于多视图图嵌入融合的蛋白质-DNA 结合位点识别。

EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion.

机构信息

School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China.

出版信息

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae330.

DOI:10.1093/bib/bbae330

PMID:38975896

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11229037/

Abstract

摘要

EGPDI：基于多视图图嵌入融合的蛋白质-DNA 结合位点识别。

EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

EGPDI：基于多视图图嵌入融合的蛋白质-DNA 结合位点识别。

EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献