• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

解析蛋白质- DNA 相互作用的语言:结合上下文嵌入和多尺度序列建模的深度学习方法。

Deciphering the Language of Protein-DNA Interactions: A Deep Learning Approach Combining Contextual Embeddings and Multi-Scale Sequence Modeling.

机构信息

Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan.

Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan; Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li 32003, Taiwan.

出版信息

J Mol Biol. 2024 Nov 15;436(22):168769. doi: 10.1016/j.jmb.2024.168769. Epub 2024 Aug 29.

DOI:10.1016/j.jmb.2024.168769
PMID:39214282
Abstract

Deciphering the mechanisms governing protein-DNA interactions is crucial for understanding key cellular processes and disease pathways. In this work, we present a powerful deep learning approach that significantly advances the computational prediction of DNA-interacting residues from protein sequences. Our method leverages the rich contextual representations learned by pre-trained protein language models, such as ProtTrans, to capture intrinsic biochemical properties and sequence motifs indicative of DNA binding sites. We then integrate these contextual embeddings with a multi-window convolutional neural network architecture, which scans across the sequence at varying window sizes to effectively identify both local and global binding patterns. Comprehensive evaluation on curated benchmark datasets demonstrates the remarkable performance of our approach, achieving an area under the ROC curve (AUC) of 0.89 - a substantial improvement over previous state-of-the-art sequence-based predictors. This showcases the immense potential of pairing advanced representation learning and deep neural network designs for uncovering the complex syntax governing protein-DNA interactions directly from primary sequences. Our work not only provides a robust computational tool for characterizing DNA-binding mechanisms, but also highlights the transformative opportunities at the intersection of language modeling, deep learning, and protein sequence analysis. The publicly available code and data further facilitate broader adoption and continued development of these techniques for accelerating mechanistic insights into vital biological processes and disease pathways. In addition, the code and data for this work are available at https://github.com/B1607/DIRP.

摘要

解析蛋白质与 DNA 相互作用的机制对于理解关键的细胞过程和疾病途径至关重要。在这项工作中,我们提出了一种强大的深度学习方法,该方法在从蛋白质序列计算预测与 DNA 相互作用的残基方面取得了显著进展。我们的方法利用了经过预训练的蛋白质语言模型(如 ProtTrans)所学习到的丰富的上下文表示,以捕获内在的生化特性和序列基序,这些特性和序列基序表明了 DNA 结合位点的存在。然后,我们将这些上下文嵌入与多窗口卷积神经网络架构相结合,该架构可以在不同的窗口大小下扫描序列,从而有效地识别局部和全局结合模式。在经过精心整理的基准数据集上进行全面评估表明,我们的方法具有出色的性能,其 ROC 曲线下面积(AUC)达到 0.89-这相较于以前基于序列的最先进预测器有了实质性的改进。这展示了将高级表示学习和深度神经网络设计相结合,直接从原始序列中揭示控制蛋白质与 DNA 相互作用的复杂语法的巨大潜力。我们的工作不仅为描述 DNA 结合机制提供了一种强大的计算工具,而且还强调了语言模型、深度学习和蛋白质序列分析交叉点带来的变革性机会。可公开获取的代码和数据进一步促进了这些技术的广泛采用和持续发展,从而加速对重要生物学过程和疾病途径的机制见解。此外,这项工作的代码和数据可在 https://github.com/B1607/DIRP 上获取。

相似文献

1
Deciphering the Language of Protein-DNA Interactions: A Deep Learning Approach Combining Contextual Embeddings and Multi-Scale Sequence Modeling.解析蛋白质- DNA 相互作用的语言:结合上下文嵌入和多尺度序列建模的深度学习方法。
J Mol Biol. 2024 Nov 15;436(22):168769. doi: 10.1016/j.jmb.2024.168769. Epub 2024 Aug 29.
2
Integrating Pre-Trained protein language model and multiple window scanning deep learning networks for accurate identification of secondary active transporters in membrane proteins.整合预训练蛋白质语言模型和多窗口扫描深度学习网络以准确识别膜蛋白中的次级主动转运体。
Methods. 2023 Dec;220:11-20. doi: 10.1016/j.ymeth.2023.10.008. Epub 2023 Oct 21.
3
SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues.SOFB 是一种全面的集成深度学习方法,用于阐明和描述蛋白质-核酸结合残基。
Commun Biol. 2024 Jun 3;7(1):679. doi: 10.1038/s42003-024-06332-0.
4
DeepDISOBind: accurate prediction of RNA-, DNA- and protein-binding intrinsically disordered residues with deep multi-task learning.DeepDISOBind:通过深度多任务学习准确预测 RNA、DNA 和蛋白质结合的无规卷曲残基。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab521.
5
DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence.DeepD2V:一种基于深度学习的新型框架,用于从组合 DNA 序列预测转录因子结合位点。
Int J Mol Sci. 2021 May 24;22(11):5521. doi: 10.3390/ijms22115521.
6
ProtTrans and multi-window scanning convolutional neural networks for the prediction of protein-peptide interaction sites.用于预测蛋白质-肽相互作用位点的ProtTrans和多窗口扫描卷积神经网络。
J Mol Graph Model. 2024 Jul;130:108777. doi: 10.1016/j.jmgm.2024.108777. Epub 2024 Apr 17.
7
Improving the prediction of DNA-protein binding by integrating multi-scale dense convolutional network with fault-tolerant coding.通过将多尺度密集卷积网络与容错编码相结合来提高 DNA-蛋白质结合的预测。
Anal Biochem. 2022 Nov 1;656:114878. doi: 10.1016/j.ab.2022.114878. Epub 2022 Aug 29.
8
High-Order Convolutional Neural Network Architecture for Predicting DNA-Protein Binding Sites.用于预测 DNA-蛋白质结合位点的高阶卷积神经网络架构。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1184-1192. doi: 10.1109/TCBB.2018.2819660. Epub 2018 Mar 26.
9
mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations.mACPpred 2.0:具有集成空间和概率特征表示的用于抗癌肽预测的堆叠深度学习。
J Mol Biol. 2024 Sep 1;436(17):168687. doi: 10.1016/j.jmb.2024.168687. Epub 2024 Jun 25.
10
SAResNet: self-attention residual network for predicting DNA-protein binding.SAResNet:用于预测 DNA-蛋白质结合的自注意力残差网络。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab101.

引用本文的文献

1
Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.蛋白质序列中核酸结合残基预测二十年进展
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf016.