Suppr超能文献

DeepD2V:一种基于深度学习的新型框架,用于从组合 DNA 序列预测转录因子结合位点。

DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence.

机构信息

School of Computer Science and Engineering, Central South University, Changsha 410075, China.

School of Computer Science and Technology, Nanjing Tech University, Nanjing 211816, China.

出版信息

Int J Mol Sci. 2021 May 24;22(11):5521. doi: 10.3390/ijms22115521.

Abstract

Predicting in vivo protein-DNA binding sites is a challenging but pressing task in a variety of fields like drug design and development. Most promoters contain a number of transcription factor (TF) binding sites, but only a small minority has been identified by biochemical experiments that are time-consuming and laborious. To tackle this challenge, many computational methods have been proposed to predict TF binding sites from DNA sequence. Although previous methods have achieved remarkable performance in the prediction of protein-DNA interactions, there is still considerable room for improvement. In this paper, we present a hybrid deep learning framework, termed DeepD2V, for transcription factor binding sites prediction. First, we construct the input matrix with an original DNA sequence and its three kinds of variant sequences, including its inverse, complementary, and complementary inverse sequence. A sliding window of size with a specific stride is used to obtain its -mer representation of input sequences. Next, we use word2vec to obtain a pre-trained -mer word distributed representation model. Finally, the probability of protein-DNA binding is predicted by using the recurrent and convolutional neural network. The experiment results on 50 public ChIP-seq benchmark datasets demonstrate the superior performance and robustness of DeepD2V. Moreover, we verify that the performance of DeepD2V using word2vec-based -mer distributed representation is better than one-hot encoding, and the integrated framework of both convolutional neural network (CNN) and bidirectional LSTM (bi-LSTM) outperforms CNN or the bi-LSTM model when used alone. The source code of DeepD2V is available at the github repository.

摘要

预测体内蛋白质-DNA 结合位点是药物设计和开发等多个领域的一项具有挑战性但紧迫的任务。大多数启动子包含多个转录因子 (TF) 结合位点,但只有一小部分通过生化实验确定,这些实验既耗时又费力。为了应对这一挑战,已经提出了许多计算方法来从 DNA 序列预测 TF 结合位点。尽管以前的方法在预测蛋白质-DNA 相互作用方面取得了显著的性能,但仍有很大的改进空间。在本文中,我们提出了一种称为 DeepD2V 的混合深度学习框架,用于转录因子结合位点预测。首先,我们使用原始 DNA 序列及其三种变体序列(包括其反转、互补和互补反转序列)构建输入矩阵。使用大小为 的滑动窗口,并使用特定的步长获取输入序列的 -mer 表示。接下来,我们使用 word2vec 获得预训练的 -mer 词分布式表示模型。最后,使用递归和卷积神经网络预测蛋白质-DNA 结合的概率。在 50 个公共 ChIP-seq 基准数据集上的实验结果表明,DeepD2V 的性能优越且稳健。此外,我们验证了使用 word2vec 基于 -mer 分布式表示的 DeepD2V 的性能优于 one-hot 编码,并且当单独使用时,卷积神经网络 (CNN) 和双向 LSTM (bi-LSTM) 的集成框架优于 CNN 或 bi-LSTM 模型。DeepD2V 的源代码可在 github 存储库中获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c11/8197256/095eefd9c57b/ijms-22-05521-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验