• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基准测试转录因子结合位点预测模型:对合成数据和生物数据的比较分析

Benchmarking transcription factor binding site prediction models: a comparative analysis on synthetic and biological data.

作者信息

Tognon Manuel, Kumbara Alisa, Betti Andrea, Ruggeri Lorenzo, Giugno Rosalba

机构信息

Computer Science Department, University of Verona, Strada Le Grazie 15, Verona, VR 37134, Italy.

Department of Engineering for Innovation Medicine, University of Verona, Strada Le Grazie 15, Verona, VR 37134, Italy.

出版信息

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf363.

DOI:10.1093/bib/bbaf363
PMID:40702706
Abstract

Transcription factors (TFs) are essential regulatory proteins controlling the cellular transcriptional states by binding to specific DNA sequences known as transcription factor binding sites (TFBSs) or motifs. Accurate TFBS identification is crucial for unraveling regulatory mechanisms driving cellular dynamics. Over the years, various computational approaches have been developed to model TFBSs, with position weight matrices (PWMs) being one of the most widely adopted methods. PWMs provide a probabilistic framework by representing nucleotide frequencies at every position within the binding site. While effective and interpretable, PWMs face significant limitations, such as their inability to capture positional dependencies or model complex interactions. To address these, advanced methods, like support vector machine (SVM)-based, and deep learning (DL)-based models, have been introduced. Leveraging human ChIP-seq data from ENCODE, we systematically benchmarked the predictive performance of PWM, SVM-, and DL-based models across different scenarios. We evaluate the impact of key factors such as training dataset size, sequence length, and kernel functions (for SVMs) on models' performance. Additionally, we explore the impact of synthetic versus real biological background data during model training. Our analysis highlights strengths and limitations of each approach under different conditions, providing practical guidance for selecting and tailoring models to specific biological datasets. To complement our analysis, we present a comprehensive database of pretrained SVM models for TFBS detection, trained on human ChIP-seq data from diverse cell lines and tissues. This resource aims to facilitate broader adoption of SVM-based methods in TFBS prediction and enhance their practical utility in regulatory genomics research.

摘要

转录因子(TFs)是一类重要的调控蛋白,通过与特定的DNA序列(称为转录因子结合位点(TFBSs)或基序)结合来控制细胞的转录状态。准确识别TFBS对于揭示驱动细胞动态变化的调控机制至关重要。多年来,人们开发了各种计算方法来对TFBS进行建模,其中位置权重矩阵(PWMs)是应用最广泛的方法之一。PWMs通过表示结合位点内每个位置的核苷酸频率提供了一个概率框架。虽然PWMs有效且可解释,但它们面临着重大局限性,例如无法捕捉位置依赖性或对复杂相互作用进行建模。为了解决这些问题,已经引入了先进的方法,如基于支持向量机(SVM)和基于深度学习(DL)的模型。利用来自ENCODE的人类ChIP-seq数据,我们系统地对基于PWM、SVM和DL的模型在不同场景下的预测性能进行了基准测试。我们评估了关键因素(如训练数据集大小、序列长度和内核函数(对于SVM))对模型性能的影响。此外,我们还探讨了模型训练期间合成背景数据与真实生物背景数据的影响。我们的分析突出了每种方法在不同条件下的优势和局限性,为针对特定生物数据集选择和定制模型提供了实用指导。为了补充我们的分析,我们提供了一个用于TFBS检测的预训练SVM模型的综合数据库,该数据库基于来自不同细胞系和组织的人类ChIP-seq数据进行训练。该资源旨在促进基于SVM的方法在TFBS预测中的更广泛应用,并提高其在调控基因组学研究中的实际效用。

相似文献

1
Benchmarking transcription factor binding site prediction models: a comparative analysis on synthetic and biological data.基准测试转录因子结合位点预测模型:对合成数据和生物数据的比较分析
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf363.
2
TFBSFootprinter: a multiomics tool for prediction of transcription factor binding sites in vertebrate species.TFBSFootprinter:一种用于预测脊椎动物物种中转录因子结合位点的多组学工具。
Transcription. 2025 Apr-Jun;16(2-3):204-223. doi: 10.1080/21541264.2025.2521764. Epub 2025 Jul 11.
3
CGGBP1 from higher amniotes restricts cytosine methylation and drives a GC-bias in transcription factor-binding sites at repressed promoters.高等羊膜动物的CGGBP1可限制胞嘧啶甲基化,并在抑制性启动子的转录因子结合位点上驱动GC偏好。
Transcription. 2025 Jul 31:1-36. doi: 10.1080/21541264.2025.2533598.
4
AI-based Hepatic Steatosis Detection and Integrated Hepatic Assessment from Cardiac CT Attenuation Scans Enhances All-cause Mortality Risk Stratification: A Multi-center Study.基于人工智能的心脏CT衰减扫描检测肝脂肪变性及综合肝脏评估可增强全因死亡风险分层:一项多中心研究
medRxiv. 2025 Jun 11:2025.06.09.25329157. doi: 10.1101/2025.06.09.25329157.
5
iACP-DPNet: a dual-pooling causal dilated convolutional network for interpretable anticancer peptide identification.iACP-DPNet:一种用于可解释抗癌肽识别的双池因果扩张卷积网络。
Funct Integr Genomics. 2025 Jul 4;25(1):147. doi: 10.1007/s10142-025-01641-x.
6
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
7
Stakeholders' perceptions and experiences of factors influencing the commissioning, delivery, and uptake of general health checks: a qualitative evidence synthesis.利益相关者对影响一般健康检查的委托、提供和接受因素的看法与体验:一项定性证据综合分析
Cochrane Database Syst Rev. 2025 Mar 20;3(3):CD014796. doi: 10.1002/14651858.CD014796.pub2.
8
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
9
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
10
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

本文引用的文献

1
Nucleotide Transformer: building and evaluating robust foundation models for human genomics.核苷酸变换器:构建和评估用于人类基因组学的强大基础模型。
Nat Methods. 2025 Feb;22(2):287-297. doi: 10.1038/s41592-024-02523-z. Epub 2024 Nov 28.
2
Progress and opportunities of foundation models in bioinformatics.生物信息学中基础模型的进展与机遇。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae548.
3
Identifying transcription factors with cell-type specific DNA binding signatures.鉴定具有细胞类型特异性 DNA 结合特征的转录因子。
BMC Genomics. 2024 Oct 14;25(1):957. doi: 10.1186/s12864-024-10859-1.
4
Less-is-more: selecting transcription factor binding regions informative for motif inference.少即是多:选择对基序推断有信息价值的转录因子结合区域。
Nucleic Acids Res. 2024 Feb 28;52(4):e20. doi: 10.1093/nar/gkad1240.
5
GraphPart: homology partitioning for biological sequence analysis.GraphPart:用于生物序列分析的同源性划分
NAR Genom Bioinform. 2023 Oct 16;5(4):lqad088. doi: 10.1093/nargab/lqad088. eCollection 2023 Dec.
6
A survey on algorithms to characterize transcription factor binding sites.一种用于刻画转录因子结合位点的算法研究综述。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad156.
7
Positional weight matrices have sufficient prediction power for analysis of noncoding variants.位置权重矩阵对于分析非编码变异具有足够的预测能力。
F1000Res. 2022 Jan 12;11:33. doi: 10.12688/f1000research.75471.3. eCollection 2022.
8
A comparative benchmark of classic DNA motif discovery tools on synthetic data.经典 DNA 基序发现工具在合成数据上的比较基准。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab303.
9
SETDB1 in cancer: overexpression and its therapeutic implications.癌症中的SETDB1:过表达及其治疗意义。
Am J Cancer Res. 2021 May 15;11(5):1803-1827. eCollection 2021.
10
Interpretation of deep learning in genomics and epigenomics.深度学习在基因组学和表观基因组学中的应用。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa177.