Suppr超能文献

TFProtBert:利用ProtBert潜在空间表示法检测与甲基化DNA结合的转录因子

TFProtBert: Detection of Transcription Factors Binding to Methylated DNA Using ProtBert Latent Space Representation.

作者信息

Gaffar Saima, Chong Kil To, Tayara Hilal

机构信息

Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea.

Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju 54896, Republic of Korea.

出版信息

Int J Mol Sci. 2025 Apr 29;26(9):4234. doi: 10.3390/ijms26094234.

Abstract

Transcription factors (TFs) are fundamental regulators of gene expression and perform diverse functions in cellular processes. The management of 3-dimensional (3D) genome conformation and gene expression relies primarily on TFs. TFs are crucial regulators of gene expression, performing various roles in biological processes. They attract transcriptional machinery to the enhancers or promoters of specific genes, thereby activating or inhibiting transcription. Identifying these TFs is a significant step towards understanding cellular gene expression mechanisms. Due to the time-consuming and labor-intensive nature of experimental methods, the development of computational models is essential. In this work, we introduced a two-layer prediction framework based on a support vector machine (SVM) using the latent space representation of a protein language model, ProtBert. The first layer of the method reliably predicts and identifies transcription factors (TFs), and in the second layer, the proposed method predicts and identifies transcription factors that prefer binding to methylated deoxyribonucleic acid (TFPMs). In addition, we also tested the proposed method on an imbalanced database. In detecting TFs and TFPMs, the proposed model consistently outperformed state-of-the-art approaches, as demonstrated by performance comparisons via empirical cross-validation analysis and independent tests.

摘要

转录因子(TFs)是基因表达的基本调节因子,在细胞过程中发挥多种功能。三维(3D)基因组构象和基因表达的调控主要依赖于转录因子。转录因子是基因表达的关键调节因子,在生物过程中发挥着各种作用。它们将转录机制吸引到特定基因的增强子或启动子上,从而激活或抑制转录。识别这些转录因子是理解细胞基因表达机制的重要一步。由于实验方法耗时且费力,因此开发计算模型至关重要。在这项工作中,我们引入了一种基于支持向量机(SVM)的两层预测框架,该框架使用蛋白质语言模型ProtBert的潜在空间表示。该方法的第一层可靠地预测和识别转录因子(TFs),在第二层中,该方法预测和识别偏好结合甲基化脱氧核糖核酸的转录因子(TFPMs)。此外,我们还在一个不平衡数据库上测试了该方法。在检测转录因子和TFPMs时,通过经验交叉验证分析和独立测试的性能比较表明,所提出的模型始终优于现有方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ac4/12071566/4f2b1c67aaa1/ijms-26-04234-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验