弱监督卷积神经网络结构用于预测蛋白质-DNA 结合。

Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):679-689. doi: 10.1109/TCBB.2018.2864203. Epub 2018 Aug 7.

DOI:10.1109/TCBB.2018.2864203

Abstract

Although convolutional neural networks (CNN) have outperformed conventional methods in predicting the sequence specificities of protein-DNA binding in recent years, they do not take full advantage of the intrinsic weakly-supervised information of DNA sequences that a bound sequence may contain multiple TFBS(s). Here, we propose a weakly-supervised convolutional neural network architecture (WSCNN), combining multiple-instance learning (MIL) with CNN, to further boost the performance of predicting protein-DNA binding. WSCNN first divides each DNA sequence into multiple overlapping subsequences (instances) with a sliding window, and then separately models each instance using CNN, and finally fuses the predicted scores of all instances in the same bag using four fusion methods, including Max, Average, Linear Regression, and Top-Bottom Instances. The experimental results on in vivo and in vitro datasets illustrate the performance of the proposed approach. Moreover, models built on in vitro data using WSCNN can predict in vivo protein-DNA binding with good accuracy. In addition, we give a quantitative analysis of the importance of the reverse-complement mode in predicting in vivo protein-DNA binding, and explain why not directly use advanced pooling layers to combine MIL with CNN, through a series of experiments.

摘要

虽然卷积神经网络（CNN）在近年来预测蛋白质-DNA 结合的序列特异性方面已经超越了传统方法，但它们并没有充分利用 DNA 序列内在的弱监督信息，即一个结合序列可能包含多个 TFBS（转录因子结合位点）。在这里，我们提出了一种弱监督卷积神经网络架构（WSCNN），将多实例学习（MIL）与 CNN 相结合，以进一步提高蛋白质-DNA 结合预测的性能。WSCNN 首先使用滑动窗口将每个 DNA 序列划分为多个重叠的子序列（实例），然后分别使用 CNN 对每个实例进行建模，最后使用四种融合方法（包括 Max、Average、Linear Regression 和 Top-Bottom Instances）融合同一袋中所有实例的预测得分。体内和体外数据集上的实验结果说明了所提出方法的性能。此外，使用 WSCNN 在体外数据上构建的模型可以很好地预测体内蛋白质-DNA 结合。此外，我们通过一系列实验对在预测体内蛋白质-DNA 结合中反向互补模式的重要性进行了定量分析，并解释了为什么不直接使用高级池化层将 MIL 与 CNN 相结合。

相似文献

Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding.

IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):679-689. doi: 10.1109/TCBB.2018.2864203. Epub 2018 Aug 7.

Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network.

Sci Rep. 2019 Jun 11;9(1):8484. doi: 10.1038/s41598-019-44966-x.

High-Order Convolutional Neural Network Architecture for Predicting DNA-Protein Binding Sites.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1184-1192. doi: 10.1109/TCBB.2018.2819660. Epub 2018 Mar 26.

Multi-Scale Capsule Network for Predicting DNA-Protein Binding Sites.

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1793-1800. doi: 10.1109/TCBB.2020.3025579. Epub 2021 Oct 7.

Computational modeling of in vivo and in vitro protein-DNA interactions by multiple instance learning.

Bioinformatics. 2017 Jul 15;33(14):2097-2105. doi: 10.1093/bioinformatics/btx115.

Predicting in-vitro Transcription Factor Binding Sites Using DNA Sequence + Shape.

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):667-676. doi: 10.1109/TCBB.2019.2947461. Epub 2021 Apr 6.

DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence.

Int J Mol Sci. 2021 May 24;22(11):5521. doi: 10.3390/ijms22115521.

Enabling full-length evolutionary profiles based deep convolutional neural network for predicting DNA-binding proteins from sequence.

Proteins. 2020 Jan;88(1):15-30. doi: 10.1002/prot.25763. Epub 2019 Jul 8.

WSHNN: A Weakly Supervised Hybrid Neural Network for the Identification of DNA-Protein Binding Sites.

Curr Comput Aided Drug Des. 2024 Feb 12. doi: 10.2174/0115734099277249240129114123.

BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae195.

引用本文的文献

Artificial intelligence: the human response to approach the complexity of big data in biology.

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf057.

A KAN-based hybrid deep neural networks for accurate identification of transcription factor binding sites.

PLoS One. 2025 May 7;20(5):e0322978. doi: 10.1371/journal.pone.0322978. eCollection 2025.

MLSNet: a deep learning model for predicting transcription factor binding sites.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae489.

DNA breathing integration with deep learning foundational model advances genome-wide binding prediction of human transcription factors.

Nucleic Acids Res. 2024 Oct 28;52(19):e91. doi: 10.1093/nar/gkae783.

MFPINC: prediction of plant ncRNAs based on multi-source feature fusion.

BMC Genomics. 2024 May 30;25(1):531. doi: 10.1186/s12864-024-10439-3.

WSHNN: A Weakly Supervised Hybrid Neural Network for the Identification of DNA-Protein Binding Sites.

Curr Comput Aided Drug Des. 2024 Feb 12. doi: 10.2174/0115734099277249240129114123.

Advancing Transcription Factor Binding Site Prediction Using DNA Breathing Dynamics and Sequence Transformers via Cross Attention.

bioRxiv. 2024 Feb 15:2024.01.16.575935. doi: 10.1101/2024.01.16.575935.

4acCPred: Weakly supervised prediction of -acetyldeoxycytosine DNA modification from sequences.

Mol Ther Nucleic Acids. 2022 Oct 14;30:337-345. doi: 10.1016/j.omtn.2022.10.004. eCollection 2022 Dec 13.

A novel deep learning-assisted hybrid network for plasmodium falciparum parasite mitochondrial proteins classification.

PLoS One. 2022 Oct 6;17(10):e0275195. doi: 10.1371/journal.pone.0275195. eCollection 2022.

Efficient and accurate diagnosis of otomycosis using an ensemble deep-learning model.

Front Mol Biosci. 2022 Aug 19;9:951432. doi: 10.3389/fmolb.2022.951432. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

弱监督卷积神经网络结构用于预测蛋白质-DNA 结合。

Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献