Suppr超能文献

深度绑定:增强对DNA结合蛋白序列特异性的预测

DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins.

作者信息

Hassanzadeh Hamid Reza, Wang May D

机构信息

Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332.

Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia 30332.

出版信息

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2016 Dec;2016:178-183. doi: 10.1109/bibm.2016.7822515. Epub 2017 Jan 19.

Abstract

Transcription factors (TFs) are macromolecules that bind to cis-regulatory specific sub-regions of DNA promoters and initiate transcription. Finding the exact location of these binding sites (aka motifs) is important in a variety of domains such as drug design and development. To address this need, several in vivo and in vitro techniques have been developed so far that try to characterize and predict the binding specificity of a protein to different DNA loci. The major problem with these techniques is that they are not accurate enough in prediction of the binding affinity and characterization of the corresponding motifs. As a result, downstream analysis is required to uncover the locations where proteins of interest bind. Here, we propose DeeperBind, a long short term recurrent convolutional network for prediction of protein binding specificities with respect to DNA probes. DeeperBind can model the positional dynamics of probe sequences and hence reckons with the contributions made by individual sub-regions in DNA sequences, in an effective way. Moreover, it can be trained and tested on datasets containing varying-length sequences. We apply our pipeline to the datasets derived from protein binding microarrays (PBMs), an in-vitro high-throughput technology for quantification of protein-DNA binding preferences, and present promising results. To the best of our knowledge, this is the most accurate pipeline that can predict binding specificities of DNA sequences from the data produced by high-throughput technologies through utilization of the power of deep learning for feature generation and positional dynamics modeling.

摘要

转录因子(TFs)是一类大分子,它们与DNA启动子的顺式调控特定子区域结合并启动转录。在药物设计与开发等多个领域中,找到这些结合位点(即基序)的确切位置至关重要。为满足这一需求,目前已开发出多种体内和体外技术,旨在表征和预测蛋白质与不同DNA位点的结合特异性。这些技术的主要问题在于,它们在预测结合亲和力和表征相应基序方面不够准确。因此,需要进行下游分析来揭示感兴趣蛋白质的结合位置。在此,我们提出了DeeperBind,这是一种长短期循环卷积网络,用于预测蛋白质相对于DNA探针的结合特异性。DeeperBind可以对探针序列的位置动态进行建模,从而有效地考虑DNA序列中各个子区域的贡献。此外,它可以在包含不同长度序列的数据集上进行训练和测试。我们将我们的流程应用于源自蛋白质结合微阵列(PBMs)的数据集,PBMs是一种用于定量蛋白质-DNA结合偏好的体外高通量技术,并呈现出了有前景的结果。据我们所知,这是最准确的流程,它能够通过利用深度学习进行特征生成和位置动态建模的能力,从高通量技术产生的数据中预测DNA序列的结合特异性。

相似文献

1
DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins.深度绑定:增强对DNA结合蛋白序列特异性的预测
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2016 Dec;2016:178-183. doi: 10.1109/bibm.2016.7822515. Epub 2017 Jan 19.
3
MotifMark: Finding regulatory motifs in DNA sequences.MotifMark:在DNA序列中寻找调控基序
Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul;2017:3890-3893. doi: 10.1109/EMBC.2017.8037706.
5
High-throughput analysis of protein-DNA binding affinity.蛋白质与DNA结合亲和力的高通量分析
Methods Mol Biol. 2014;1062:697-709. doi: 10.1007/978-1-62703-580-4_36.

引用本文的文献

10
Seq-RBPPred: Predicting RNA-Binding Proteins from Sequence.Seq-RBPPred:从序列预测RNA结合蛋白。
ACS Omega. 2024 Mar 4;9(11):12734-12742. doi: 10.1021/acsomega.3c08381. eCollection 2024 Mar 19.

本文引用的文献

1
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description.长期递归卷积网络的视觉识别与描述。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):677-691. doi: 10.1109/TPAMI.2016.2599174. Epub 2016 Sep 1.
4
DNA motif elucidation using belief propagation.利用信念传播阐明 DNA 基序。
Nucleic Acids Res. 2013 Sep;41(16):e153. doi: 10.1093/nar/gkt574. Epub 2013 Jun 29.
10
Genome-wide mapping of in vivo protein-DNA interactions.体内蛋白质-DNA相互作用的全基因组图谱绘制。
Science. 2007 Jun 8;316(5830):1497-502. doi: 10.1126/science.1141319. Epub 2007 May 31.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验