通过多实例学习对体内和体外蛋白质-DNA 相互作用进行计算建模。

Computational modeling of in vivo and in vitro protein-DNA interactions by multiple instance learning.

机构信息

Department of Computer Science, University of Texas at San Antonio, San Antonio, TX, USA.

出版信息

Bioinformatics. 2017 Jul 15;33(14):2097-2105. doi: 10.1093/bioinformatics/btx115.

DOI:10.1093/bioinformatics/btx115

PMID:28334224

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5870851/

Abstract

MOTIVATION

The study of transcriptional regulation is still difficult yet fundamental in molecular biology research. While the development of both in vivo and in vitro profiling techniques have significantly enhanced our knowledge of transcription factor (TF)-DNA interactions, computational models of TF-DNA interactions are relatively simple and may not reveal sufficient biological insight. In particular, supervised learning based models for TF-DNA interactions attempt to map sequence-level features ( k -mers) to binding event but usually ignore the location of k -mers, which can cause data fragmentation and consequently inferior model performance.

RESULTS

Here, we propose a novel algorithm based on the so-called multiple-instance learning (MIL) paradigm. MIL breaks each DNA sequence into multiple overlapping subsequences and models each subsequence separately, therefore implicitly takes into consideration binding site locations, resulting in both higher accuracy and better interpretability of the models. The result from both in vivo and in vitro TF-DNA interaction data show that our approach significantly outperform conventional single-instance learning based algorithms. Importantly, the models learned from in vitro data using our approach can predict in vivo binding with very good accuracy. In addition, the location information obtained by our method provides additional insight for motif finding results from ChIP-Seq data. Finally, our approach can be easily combined with other state-of-the-art TF-DNA interaction modeling methods.

AVAILABILITY AND IMPLEMENTATION

http://www.cs.utsa.edu/∼jruan/MIL/.

CONTACT

jianhua.ruan@utsa.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

转录调控的研究在分子生物学研究中仍然具有挑战性，但也很基础。虽然体内和体外剖析技术的发展极大地增强了我们对转录因子（TF）-DNA 相互作用的了解，但 TF-DNA 相互作用的计算模型相对简单，可能无法揭示足够的生物学见解。特别是，基于监督学习的 TF-DNA 相互作用模型试图将序列级特征（k-mer）映射到结合事件，但通常忽略 k-mer 的位置，这可能导致数据碎片化，从而导致模型性能下降。

结果

在这里，我们提出了一种基于所谓的多实例学习（MIL）范例的新算法。MIL 将每个 DNA 序列分解为多个重叠的子序列，并分别对每个子序列进行建模，因此隐含地考虑了结合位点的位置，从而提高了模型的准确性和可解释性。来自体内和体外 TF-DNA 相互作用数据的结果表明，我们的方法明显优于传统的基于单实例学习的算法。重要的是，使用我们的方法从体外数据中学习到的模型可以非常准确地预测体内结合。此外，我们的方法获得的位置信息为 ChIP-Seq 数据中的 motif 发现结果提供了额外的见解。最后，我们的方法可以很容易地与其他最先进的 TF-DNA 相互作用建模方法结合使用。

可用性和实现

http://www.cs.utsa.edu/∼jruan/MIL/。

联系方式

jianhua.ruan@utsa.edu。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

Computational modeling of in vivo and in vitro protein-DNA interactions by multiple instance learning.

Bioinformatics. 2017 Jul 15;33(14):2097-2105. doi: 10.1093/bioinformatics/btx115.

High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions.

PLoS Comput Biol. 2010 Sep 9;6(9):e1000916. doi: 10.1371/journal.pcbi.1000916.

BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.

Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.

ChIPulate: A comprehensive ChIP-seq simulation pipeline.

PLoS Comput Biol. 2019 Mar 21;15(3):e1006921. doi: 10.1371/journal.pcbi.1006921. eCollection 2019 Mar.

Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding.

IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):679-689. doi: 10.1109/TCBB.2018.2864203. Epub 2018 Aug 7.

Learning protein-DNA interaction landscapes by integrating experimental data through computational models.

Bioinformatics. 2014 Oct 15;30(20):2868-74. doi: 10.1093/bioinformatics/btu408. Epub 2014 Jun 27.

Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.

BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.

BEESEM: estimation of binding energy models using HT-SELEX data.

Bioinformatics. 2017 Aug 1;33(15):2288-2295. doi: 10.1093/bioinformatics/btx191.

BindSpace decodes transcription factor binding signals by large-scale sequence embedding.

Nat Methods. 2019 Sep;16(9):858-861. doi: 10.1038/s41592-019-0511-y. Epub 2019 Aug 12.

Romulus: robust multi-state identification of transcription factor binding sites from DNase-seq data.

Bioinformatics. 2016 Aug 15;32(16):2419-26. doi: 10.1093/bioinformatics/btw209. Epub 2016 Apr 19.

引用本文的文献

iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks.

PLoS One. 2025 May 13;20(5):e0320817. doi: 10.1371/journal.pone.0320817. eCollection 2025.

WSHNN: A Weakly Supervised Hybrid Neural Network for the Identification of DNA-Protein Binding Sites.

Curr Comput Aided Drug Des. 2024 Feb 12. doi: 10.2174/0115734099277249240129114123.

4acCPred: Weakly supervised prediction of -acetyldeoxycytosine DNA modification from sequences.

Mol Ther Nucleic Acids. 2022 Oct 14;30:337-345. doi: 10.1016/j.omtn.2022.10.004. eCollection 2022 Dec 13.

Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences.

BMC Bioinformatics. 2022 Nov 8;23(1):469. doi: 10.1186/s12859-022-05012-2.

Weakly supervised learning of RNA modifications from low-resolution epitranscriptome data.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i222-i230. doi: 10.1093/bioinformatics/btab278.

Functional Site Discovery From Incomplete Training Data: A Case Study With Nucleic Acid-Binding Proteins.

Front Genet. 2019 Aug 30;10:729. doi: 10.3389/fgene.2019.00729. eCollection 2019.

Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network.

Sci Rep. 2019 Jun 11;9(1):8484. doi: 10.1038/s41598-019-44966-x.

MD-SVM: a novel SVM-based algorithm for the motif discovery of transcription factor binding sites.

BMC Bioinformatics. 2019 May 1;20(Suppl 7):200. doi: 10.1186/s12859-019-2735-3.

Direct AUC optimization of regulatory motifs.

Bioinformatics. 2017 Jul 15;33(14):i243-i251. doi: 10.1093/bioinformatics/btx255.

本文引用的文献

Prediction of fine-tuned promoter activity from DNA sequence.

F1000Res. 2016 Feb 11;5:158. doi: 10.12688/f1000research.7485.1. eCollection 2016.

Comprehensive Identification of Krüppel-Like Factor Family Members Contributing to the Self-Renewal of Mouse Embryonic Stem Cells and Cellular Reprogramming.

PLoS One. 2016 Mar 4;11(3):e0150715. doi: 10.1371/journal.pone.0150715. eCollection 2016.

MOST+: A de novo motif finding approach combining genomic sequence and heterogeneous genome-wide signatures.

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S13. doi: 10.1186/1471-2164-16-S7-S13. Epub 2015 Jun 11.

A structure-based Multiple-Instance Learning approach to predicting in vitro transcription factor-DNA interaction.

BMC Genomics. 2015;16 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2164-16-S4-S3. Epub 2015 Apr 21.

UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions.

Nucleic Acids Res. 2015 Jan;43(Database issue):D117-22. doi: 10.1093/nar/gku1045. Epub 2014 Nov 5.

STRING v10: protein-protein interaction networks, integrated over the tree of life.

Nucleic Acids Res. 2015 Jan;43(Database issue):D447-52. doi: 10.1093/nar/gku1003. Epub 2014 Oct 28.

Determination and inference of eukaryotic transcription factor sequence specificity.

Cell. 2014 Sep 11;158(6):1431-1443. doi: 10.1016/j.cell.2014.08.009.

Temporal mapping of CEBPA and CEBPB binding during liver regeneration reveals dynamic occupancy and specific regulatory codes for homeostatic and cell cycle gene batteries.

Genome Res. 2013 Apr;23(4):592-603. doi: 10.1101/gr.146399.112. Epub 2013 Feb 12.

Evaluation of methods for modeling transcription factor sequence specificity.

Nat Biotechnol. 2013 Feb;31(2):126-34. doi: 10.1038/nbt.2486. Epub 2013 Jan 27.

Integrative annotation of chromatin elements from ENCODE data.

Nucleic Acids Res. 2013 Jan;41(2):827-41. doi: 10.1093/nar/gks1284. Epub 2012 Dec 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过多实例学习对体内和体外蛋白质-DNA 相互作用进行计算建模。

Computational modeling of in vivo and in vitro protein-DNA interactions by multiple instance learning.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献