Suppr超能文献

通过深度递归神经网络检测与甲基化 DNA 结合的转录因子。

Detection of transcription factors binding to methylated DNA by deep recurrent neural network.

机构信息

College of Information and Computer Engineering at Northeast Forestry University of China.

School of management at Henan Institute of Technology of China.

出版信息

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab533.

Abstract

Transcription factors (TFs) are proteins specifically involved in gene expression regulation. It is generally accepted in epigenetics that methylated nucleotides could prevent the TFs from binding to DNA fragments. However, recent studies have confirmed that some TFs have capability to interact with methylated DNA fragments to further regulate gene expression. Although biochemical experiments could recognize TFs binding to methylated DNA sequences, these wet experimental methods are time-consuming and expensive. Machine learning methods provide a good choice for quickly identifying these TFs without experimental materials. Thus, this study aims to design a robust predictor to detect methylated DNA-bound TFs. We firstly proposed using tripeptide word vector feature to formulate protein samples. Subsequently, based on recurrent neural network with long short-term memory, a two-step computational model was designed. The first step predictor was utilized to discriminate transcription factors from non-transcription factors. Once proteins were predicted as TFs, the second step predictor was employed to judge whether the TFs can bind to methylated DNA. Through the independent dataset test, the accuracies of the first step and the second step are 86.63% and 73.59%, respectively. In addition, the statistical analysis of the distribution of tripeptides in training samples showed that the position and number of some tripeptides in the sequence could affect the binding of TFs to methylated DNA. Finally, on the basis of our model, a free web server was established based on the proposed model, which can be available at https://bioinfor.nefu.edu.cn/TFPM/.

摘要

转录因子(TFs)是专门参与基因表达调控的蛋白质。在表观遗传学中,人们普遍认为甲基化核苷酸可以阻止 TFs 与 DNA 片段结合。然而,最近的研究证实,一些 TFs 能够与甲基化的 DNA 片段相互作用,从而进一步调节基因表达。尽管生化实验可以识别与甲基化 DNA 序列结合的 TFs,但这些湿实验方法既耗时又昂贵。机器学习方法为在没有实验材料的情况下快速识别这些 TFs 提供了一个很好的选择。因此,本研究旨在设计一个强大的预测器来检测与甲基化 DNA 结合的 TFs。我们首先提出使用三肽字向量特征来构建蛋白质样本。随后,基于具有长短期记忆的递归神经网络,设计了一个两步计算模型。第一步预测器用于区分转录因子和非转录因子。一旦蛋白质被预测为 TFs,第二步预测器就用于判断 TFs 是否可以与甲基化 DNA 结合。通过独立数据集测试,第一步和第二步的准确率分别为 86.63%和 73.59%。此外,对训练样本中三肽分布的统计分析表明,序列中某些三肽的位置和数量可能会影响 TFs 与甲基化 DNA 的结合。最后,在我们的模型基础上,建立了一个基于所提出模型的免费网络服务器,可在 https://bioinfor.nefu.edu.cn/TFPM/ 上访问。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验