Suppr超能文献

DeepTFactor:一种基于深度学习的转录因子预测工具。

DeepTFactor: A deep learning-based tool for the prediction of transcription factors.

机构信息

Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Plus Program), Korea Advanced Institute of Science and Technology, 34141 Daejeon, Republic of Korea.

Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Korea Advanced Institute of Science and Technology, 34141 Daejeon, Republic of Korea.

出版信息

Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). doi: 10.1073/pnas.2021171118.

Abstract

A transcription factor (TF) is a sequence-specific DNA-binding protein that modulates the transcription of a set of particular genes, and thus regulates gene expression in the cell. TFs have commonly been predicted by analyzing sequence homology with the DNA-binding domains of TFs already characterized. Thus, TFs that do not show homologies with the reported ones are difficult to predict. Here we report the development of a deep learning-based tool, DeepTFactor, that predicts whether a protein in question is a TF. DeepTFactor uses a convolutional neural network to extract features of a protein. It showed high performance in predicting TFs of both eukaryotic and prokaryotic origins, resulting in 1 scores of 0.8154 and 0.8000, respectively. Analysis of the gradients of prediction score with respect to input suggested that DeepTFactor detects DNA-binding domains and other latent features for TF prediction. DeepTFactor predicted 332 candidate TFs in K-12 MG1655. Among them, 84 candidate TFs belong to the y-ome, which is a collection of genes that lack experimental evidence of function. We experimentally validated the results of DeepTFactor prediction by further characterizing genome-wide binding sites of three predicted TFs, YqhC, YiaU, and YahB. Furthermore, we made available the list of 4,674,808 TFs predicted from 73,873,012 protein sequences in 48,346 genomes. DeepTFactor will serve as a useful tool for predicting TFs, which is necessary for understanding the regulatory systems of organisms of interest. We provide DeepTFactor as a stand-alone program, available at https://bitbucket.org/kaistsystemsbiology/deeptfactor.

摘要

转录因子(TF)是一种序列特异性 DNA 结合蛋白,可调节一组特定基因的转录,从而调节细胞中的基因表达。通常通过分析与已鉴定的 TF 的 DNA 结合域的序列同源性来预测 TF。因此,与已报道的 TF 没有同源性的 TF 很难预测。在这里,我们报告了一种基于深度学习的工具 DeepTFactor 的开发,该工具可预测有疑问的蛋白质是否为 TF。DeepTFactor 使用卷积神经网络提取蛋白质的特征。它在预测真核生物和原核生物起源的 TF 方面表现出很高的性能,分别得到了 0.8154 和 0.8000 的 1 分数。对预测得分相对于输入的梯度的分析表明,DeepTFactor 用于 TF 预测的检测 DNA 结合域和其他潜在特征。DeepTFactor 在 K-12 MG1655 中预测了 332 个候选 TF。其中,84 个候选 TF 属于 y-ome,这是一组缺乏功能实验证据的基因。我们通过进一步表征三个预测的 TF(YqhC、YiaU 和 YahB)的全基因组结合位点,实验验证了 DeepTFactor 预测的结果。此外,我们提供了从 48,346 个基因组中的 73,873,012 个蛋白质序列中预测的 4,674,808 个 TF 的列表。DeepTFactor 将成为预测 TF 的有用工具,这对于理解感兴趣的生物体的调控系统是必要的。我们提供了一个独立的程序 DeepTFactor,可以在 https://bitbucket.org/kaistsystemsbiology/deeptfactor 上获得。

相似文献

4
Multi-Scale Capsule Network for Predicting DNA-Protein Binding Sites.多尺度胶囊网络预测 DNA-蛋白质结合位点
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1793-1800. doi: 10.1109/TCBB.2020.3025579. Epub 2021 Oct 7.

引用本文的文献

10

本文引用的文献

5
Opening the Black Box: Interpretable Machine Learning for Geneticists.打开黑箱:遗传学家的可解释机器学习。
Trends Genet. 2020 Jun;36(6):442-455. doi: 10.1016/j.tig.2020.03.005. Epub 2020 Apr 17.
9
Logomaker: beautiful sequence logos in Python.Logomaker:用 Python 绘制优美的序列 logo。
Bioinformatics. 2020 Apr 1;36(7):2272-2274. doi: 10.1093/bioinformatics/btz921.
10
Machine learning applications in systems metabolic engineering.机器学习在系统代谢工程中的应用。
Curr Opin Biotechnol. 2020 Aug;64:1-9. doi: 10.1016/j.copbio.2019.08.010. Epub 2019 Sep 30.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验