• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DeepTFactor:一种基于深度学习的转录因子预测工具。

DeepTFactor: A deep learning-based tool for the prediction of transcription factors.

机构信息

Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Plus Program), Korea Advanced Institute of Science and Technology, 34141 Daejeon, Republic of Korea.

Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Korea Advanced Institute of Science and Technology, 34141 Daejeon, Republic of Korea.

出版信息

Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). doi: 10.1073/pnas.2021171118.

DOI:10.1073/pnas.2021171118
PMID:33372147
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7812831/
Abstract

A transcription factor (TF) is a sequence-specific DNA-binding protein that modulates the transcription of a set of particular genes, and thus regulates gene expression in the cell. TFs have commonly been predicted by analyzing sequence homology with the DNA-binding domains of TFs already characterized. Thus, TFs that do not show homologies with the reported ones are difficult to predict. Here we report the development of a deep learning-based tool, DeepTFactor, that predicts whether a protein in question is a TF. DeepTFactor uses a convolutional neural network to extract features of a protein. It showed high performance in predicting TFs of both eukaryotic and prokaryotic origins, resulting in 1 scores of 0.8154 and 0.8000, respectively. Analysis of the gradients of prediction score with respect to input suggested that DeepTFactor detects DNA-binding domains and other latent features for TF prediction. DeepTFactor predicted 332 candidate TFs in K-12 MG1655. Among them, 84 candidate TFs belong to the y-ome, which is a collection of genes that lack experimental evidence of function. We experimentally validated the results of DeepTFactor prediction by further characterizing genome-wide binding sites of three predicted TFs, YqhC, YiaU, and YahB. Furthermore, we made available the list of 4,674,808 TFs predicted from 73,873,012 protein sequences in 48,346 genomes. DeepTFactor will serve as a useful tool for predicting TFs, which is necessary for understanding the regulatory systems of organisms of interest. We provide DeepTFactor as a stand-alone program, available at https://bitbucket.org/kaistsystemsbiology/deeptfactor.

摘要

转录因子(TF)是一种序列特异性 DNA 结合蛋白,可调节一组特定基因的转录,从而调节细胞中的基因表达。通常通过分析与已鉴定的 TF 的 DNA 结合域的序列同源性来预测 TF。因此,与已报道的 TF 没有同源性的 TF 很难预测。在这里,我们报告了一种基于深度学习的工具 DeepTFactor 的开发,该工具可预测有疑问的蛋白质是否为 TF。DeepTFactor 使用卷积神经网络提取蛋白质的特征。它在预测真核生物和原核生物起源的 TF 方面表现出很高的性能,分别得到了 0.8154 和 0.8000 的 1 分数。对预测得分相对于输入的梯度的分析表明,DeepTFactor 用于 TF 预测的检测 DNA 结合域和其他潜在特征。DeepTFactor 在 K-12 MG1655 中预测了 332 个候选 TF。其中,84 个候选 TF 属于 y-ome,这是一组缺乏功能实验证据的基因。我们通过进一步表征三个预测的 TF(YqhC、YiaU 和 YahB)的全基因组结合位点,实验验证了 DeepTFactor 预测的结果。此外,我们提供了从 48,346 个基因组中的 73,873,012 个蛋白质序列中预测的 4,674,808 个 TF 的列表。DeepTFactor 将成为预测 TF 的有用工具,这对于理解感兴趣的生物体的调控系统是必要的。我们提供了一个独立的程序 DeepTFactor,可以在 https://bitbucket.org/kaistsystemsbiology/deeptfactor 上获得。

相似文献

1
DeepTFactor: A deep learning-based tool for the prediction of transcription factors.DeepTFactor:一种基于深度学习的转录因子预测工具。
Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). doi: 10.1073/pnas.2021171118.
2
DeepReg: a deep learning hybrid model for predicting transcription factors in eukaryotic and prokaryotic genomes.DeepReg:一种用于预测真核生物和原核生物基因组中转录因子的深度学习混合模型。
Sci Rep. 2024 Apr 21;14(1):9155. doi: 10.1038/s41598-024-59487-5.
3
Nonconsensus Protein Binding to Repetitive DNA Sequence Elements Significantly Affects Eukaryotic Genomes.与重复DNA序列元件的非一致性蛋白质结合显著影响真核生物基因组。
PLoS Comput Biol. 2015 Aug 18;11(8):e1004429. doi: 10.1371/journal.pcbi.1004429. eCollection 2015 Aug.
4
Multi-Scale Capsule Network for Predicting DNA-Protein Binding Sites.多尺度胶囊网络预测 DNA-蛋白质结合位点
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1793-1800. doi: 10.1109/TCBB.2020.3025579. Epub 2021 Oct 7.
5
A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data.基于全基因组结合数据的转录因子相互作用和结合位点排列的生物物理模型分析。
PLoS One. 2009 Dec 1;4(12):e8155. doi: 10.1371/journal.pone.0008155.
6
A map of direct TF-DNA interactions in the human genome.人类基因组中直接 TF-DNA 相互作用的图谱。
Nucleic Acids Res. 2019 Feb 28;47(4):e21. doi: 10.1093/nar/gky1210.
7
Enhancing the interpretability of transcription factor binding site prediction using attention mechanism.利用注意力机制提高转录因子结合位点预测的可解释性。
Sci Rep. 2020 Aug 7;10(1):13413. doi: 10.1038/s41598-020-70218-4.
8
Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility.基于染色质可及性评估预测转录因子结合位点的模型可转移性。
BMC Bioinformatics. 2017 Jul 27;18(1):355. doi: 10.1186/s12859-017-1769-7.
9
High-resolution transcription factor binding sites prediction improved performance and interpretability by deep learning method.深度学习方法提高了高分辨率转录因子结合位点预测的性能和可解释性。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab273.
10
TFregulomeR reveals transcription factors' context-specific features and functions.TFregulomeR 揭示了转录因子的上下文特异性特征和功能。
Nucleic Acids Res. 2020 Jan 24;48(2):e10. doi: 10.1093/nar/gkz1088.

引用本文的文献

1
Identification of sepsis biomarkers through glutamine metabolism-mediated immune regulation: a comprehensive analysis employing mendelian randomization, multi-omics integration, and machine learning.通过谷氨酰胺代谢介导的免疫调节鉴定脓毒症生物标志物:一项采用孟德尔随机化、多组学整合和机器学习的综合分析
Front Immunol. 2025 Aug 20;16:1640425. doi: 10.3389/fimmu.2025.1640425. eCollection 2025.
2
Active learning-guided optimization of cell-free biosensors for lead testing in drinking water.主动学习引导的用于饮用水中铅检测的无细胞生物传感器优化
bioRxiv. 2025 Aug 22:2025.08.20.671382. doi: 10.1101/2025.08.20.671382.
3
Integrative bioinformatics and deep learning to identify common genetic pathways in Crohn's disease and ischemic cardiomyopathy.整合生物信息学与深度学习以识别克罗恩病和缺血性心肌病中的共同遗传通路。
J Genet Eng Biotechnol. 2025 Sep;23(3):100529. doi: 10.1016/j.jgeb.2025.100529. Epub 2025 Jun 26.
4
ProtPhage: a deep learning framework for phage viral protein identification and functional annotation.原噬菌体:一种用于噬菌体病毒蛋白识别和功能注释的深度学习框架。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf285.
5
Ovarian transcriptome analyses indicate that weak juvenile hormone signaling underlies the molecular basis of oogenesis deficiencies in mosquitoes.卵巢转录组分析表明,微弱的保幼激素信号是蚊子卵子发生缺陷的分子基础。
BMC Biol. 2025 Jun 9;23(1):160. doi: 10.1186/s12915-025-02266-z.
6
The gene regulatory mechanisms shaping the heterogeneity of venom production in the Cape coral snake.塑造海角珊瑚蛇毒液产生异质性的基因调控机制。
Genome Biol. 2025 May 19;26(1):130. doi: 10.1186/s13059-025-03602-w.
7
Microbial Transcription Factor-Based Biosensors: Innovations from Design to Applications in Synthetic Biology.基于微生物转录因子的生物传感器:从设计到合成生物学应用的创新
Biosensors (Basel). 2025 Mar 31;15(4):221. doi: 10.3390/bios15040221.
8
MTD: A cloud-based omics database and interactive platform for .最大耐受剂量:一个基于云的组学数据库和交互式平台,用于…… (原文不完整)
Synth Syst Biotechnol. 2025 Apr 3;10(3):783-793. doi: 10.1016/j.synbio.2025.04.001. eCollection 2025 Sep.
9
Gene network centrality analysis identifies key regulators coordinating day-night metabolic transitions in PCC 7942 despite limited accuracy in predicting direct regulator-gene interactions.基因网络中心性分析确定了协调集胞藻7942昼夜代谢转变的关键调节因子,尽管在预测直接调节因子-基因相互作用方面准确性有限。
Front Microbiol. 2025 Mar 26;16:1569559. doi: 10.3389/fmicb.2025.1569559. eCollection 2025.
10
Identification and catalog of viral transcriptional regulators in human diseases.人类疾病中病毒转录调节因子的鉴定与编目。
iScience. 2025 Feb 21;28(3):112081. doi: 10.1016/j.isci.2025.112081. eCollection 2025 Mar 21.

本文引用的文献

1
Sequence-to-function deep learning frameworks for engineered riboregulators.序列到功能的工程核糖调控因子深度学习框架。
Nat Commun. 2020 Oct 7;11(1):5058. doi: 10.1038/s41467-020-18676-2.
2
Novel regulators of the gene encoding the master regulator of biofilm formation in K-12.新型调控子调控 K-12 中生物膜形成主调控子基因的表达。
Microbiology (Reading). 2020 Sep;166(9):880-890. doi: 10.1099/mic.0.000947.
3
A biochemically-interpretable machine learning classifier for microbial GWAS.一种用于微生物 GWAS 的生物化学可解释机器学习分类器。
Nat Commun. 2020 May 22;11(1):2580. doi: 10.1038/s41467-020-16310-9.
4
TransformerCPI: improving compound-protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments.TransformerCPI:通过基于序列的深度学习、自注意力机制和标签反转实验提高化合物-蛋白质相互作用预测。
Bioinformatics. 2020 Aug 15;36(16):4406-4414. doi: 10.1093/bioinformatics/btaa524.
5
Opening the Black Box: Interpretable Machine Learning for Geneticists.打开黑箱:遗传学家的可解释机器学习。
Trends Genet. 2020 Jun;36(6):442-455. doi: 10.1016/j.tig.2020.03.005. Epub 2020 Apr 17.
6
Interpretable factor models of single-cell RNA-seq via variational autoencoders.基于变分自动编码器的单细胞 RNA-seq 可解释因子模型。
Bioinformatics. 2020 Jun 1;36(11):3418-3421. doi: 10.1093/bioinformatics/btaa169.
7
Deep neural networks for interpreting RNA-binding protein target preferences.用于解释 RNA 结合蛋白靶标偏好的深度神经网络。
Genome Res. 2020 Feb;30(2):214-226. doi: 10.1101/gr.247494.118. Epub 2020 Jan 28.
8
UDSMProt: universal deep sequence models for protein classification.UDSMProt:用于蛋白质分类的通用深度序列模型。
Bioinformatics. 2020 Apr 15;36(8):2401-2409. doi: 10.1093/bioinformatics/btaa003.
9
Logomaker: beautiful sequence logos in Python.Logomaker:用 Python 绘制优美的序列 logo。
Bioinformatics. 2020 Apr 1;36(7):2272-2274. doi: 10.1093/bioinformatics/btz921.
10
Machine learning applications in systems metabolic engineering.机器学习在系统代谢工程中的应用。
Curr Opin Biotechnol. 2020 Aug;64:1-9. doi: 10.1016/j.copbio.2019.08.010. Epub 2019 Sep 30.