• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因组学中的可解释人工智能:基于专家混合模型的转录因子结合位点预测

Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts.

作者信息

Tripathi Aakash, Nielsen Ian E, Umer Muhammad, Ramachandran Ravi P, Rasool Ghulam

机构信息

Machine Learning, Moffitt Cancer Center, 12902 USF Magnolia Drive, Tampa, FL, 33612, USA.

Department of Electrical and Computer Engineering, Rowan University, Glassboro, NJ, 08028.

出版信息

ArXiv. 2025 Jul 18:arXiv:2507.09754v2.

PMID:40709306
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12288655/
Abstract

Transcription Factor Binding Site (TFBS) prediction is crucial for understanding gene regulation and various biological processes. This study introduces a novel Mixture of Experts (MoE) approach for TFBS prediction, integrating multiple pre-trained Convolutional Neural Network (CNN) models, each specializing in different TFBS patterns. We evaluate the performance of our MoE model against individual expert models on both in-distribution and out-of-distribution (OOD) datasets, using six randomly selected transcription factors (TFs) for OOD testing. Our results demonstrate that the MoE model achieves competitive or superior performance across diverse TF binding sites, particularly excelling in OOD scenarios. The Analysis of Variance (ANOVA) statistical test confirms the significance of these performance differences. Additionally, we introduce ShiftSmooth, a novel attribution mapping technique that provides more robust model interpretability by considering small shifts in input sequences. Through comprehensive explainability analysis, we show that ShiftSmooth offers superior attribution for motif discovery and localization compared to traditional Vanilla Gradient methods. Our work presents an efficient, generalizable, and interpretable solution for TFBS prediction, potentially enabling new discoveries in genome biology and advancing our understanding of transcriptional regulation.

摘要

转录因子结合位点(TFBS)预测对于理解基因调控和各种生物过程至关重要。本研究引入了一种用于TFBS预测的新型专家混合(MoE)方法,该方法整合了多个预训练的卷积神经网络(CNN)模型,每个模型专门针对不同的TFBS模式。我们使用六个随机选择的转录因子(TFs)进行分布外(OOD)测试,在分布内和分布外(OOD)数据集上针对单个专家模型评估了我们的MoE模型的性能。我们的结果表明,MoE模型在各种TF结合位点上实现了有竞争力或更优的性能,特别是在OOD场景中表现出色。方差分析(ANOVA)统计检验证实了这些性能差异的显著性。此外,我们引入了ShiftSmooth,这是一种新颖的归因映射技术,通过考虑输入序列中的小偏移来提供更强大的模型可解释性。通过全面的可解释性分析,我们表明与传统的香草梯度方法相比,ShiftSmooth在基序发现和定位方面提供了更好的归因。我们的工作为TFBS预测提供了一种高效、可推广且可解释的解决方案,有可能在基因组生物学中实现新的发现,并推进我们对转录调控的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/87f3cc856780/nihpp-2507.09754v2-f0019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/a35189a867f2/nihpp-2507.09754v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/71fcfb9cdb70/nihpp-2507.09754v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/7d7f194b0518/nihpp-2507.09754v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/66ef56ecf9c8/nihpp-2507.09754v2-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/ed8e135230a4/nihpp-2507.09754v2-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/f33c0c3432da/nihpp-2507.09754v2-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/949778402729/nihpp-2507.09754v2-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/005830f1b034/nihpp-2507.09754v2-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/e04991fbcf82/nihpp-2507.09754v2-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/196ce2283e6d/nihpp-2507.09754v2-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/3209356f6580/nihpp-2507.09754v2-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/721403646485/nihpp-2507.09754v2-f0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/091ef5bee3e6/nihpp-2507.09754v2-f0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/adb89bf41677/nihpp-2507.09754v2-f0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/7cc755b7bcc9/nihpp-2507.09754v2-f0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/7d666038d413/nihpp-2507.09754v2-f0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/8fb5e0c8eca4/nihpp-2507.09754v2-f0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/001435c7606b/nihpp-2507.09754v2-f0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/87f3cc856780/nihpp-2507.09754v2-f0019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/a35189a867f2/nihpp-2507.09754v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/71fcfb9cdb70/nihpp-2507.09754v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/7d7f194b0518/nihpp-2507.09754v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/66ef56ecf9c8/nihpp-2507.09754v2-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/ed8e135230a4/nihpp-2507.09754v2-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/f33c0c3432da/nihpp-2507.09754v2-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/949778402729/nihpp-2507.09754v2-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/005830f1b034/nihpp-2507.09754v2-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/e04991fbcf82/nihpp-2507.09754v2-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/196ce2283e6d/nihpp-2507.09754v2-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/3209356f6580/nihpp-2507.09754v2-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/721403646485/nihpp-2507.09754v2-f0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/091ef5bee3e6/nihpp-2507.09754v2-f0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/adb89bf41677/nihpp-2507.09754v2-f0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/7cc755b7bcc9/nihpp-2507.09754v2-f0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/7d666038d413/nihpp-2507.09754v2-f0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/8fb5e0c8eca4/nihpp-2507.09754v2-f0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/001435c7606b/nihpp-2507.09754v2-f0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34cb/12288655/87f3cc856780/nihpp-2507.09754v2-f0019.jpg

相似文献

1
Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts.基因组学中的可解释人工智能:基于专家混合模型的转录因子结合位点预测
ArXiv. 2025 Jul 18:arXiv:2507.09754v2.
2
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
3
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
4
Analyzing explainability of YOLO-based breast cancer detection using heat map visualizations.使用热图可视化分析基于YOLO的乳腺癌检测的可解释性。
Quant Imaging Med Surg. 2025 Jul 1;15(7):6252-6271. doi: 10.21037/qims-2024-2911. Epub 2025 Jun 30.
5
Advancing personalized healthcare: leveraging explainable AI for BPPV risk assessment.推进个性化医疗:利用可解释人工智能进行良性阵发性位置性眩晕风险评估。
Health Inf Sci Syst. 2024 Nov 24;13(1):1. doi: 10.1007/s13755-024-00317-3. eCollection 2025 Dec.
6
Actor critic with experience replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy.基于经验回放的演员-评论家算法用于前列腺癌调强放射治疗的自动治疗计划
Med Phys. 2025 Jul;52(7):e17915. doi: 10.1002/mp.17915. Epub 2025 May 31.
7
BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning.BERT-TFBS:一种基于迁移学习的用于预测转录因子结合位点的新型基于BERT的模型。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae195.
8
Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology.通过同源性预测亲和力(PATH):基于持久同源性的可解释结合亲和力预测
bioRxiv. 2024 Oct 21:2023.11.16.567384. doi: 10.1101/2023.11.16.567384.
9
A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.用于评估、选择和解释2型糖尿病患者心血管疾病结局机器学习模型的责任框架:方法与验证研究
JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.
10
Benchmarking transcription factor binding site prediction models: a comparative analysis on synthetic and biological data.基准测试转录因子结合位点预测模型:对合成数据和生物数据的比较分析
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf363.

本文引用的文献

1
The developmental and evolutionary characteristics of transcription factor binding site clustered regions based on an explainable machine learning model.基于可解释机器学习模型的转录因子结合位点聚类区域的发育和进化特征。
Nucleic Acids Res. 2024 Jul 22;52(13):7610-7626. doi: 10.1093/nar/gkae441.
2
ExplaiNN: interpretable and transparent neural networks for genomics.ExplaiNN:基因组学的可解释和透明神经网络。
Genome Biol. 2023 Jun 27;24(1):154. doi: 10.1186/s13059-023-02985-y.
3
maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks.
maxATAC:基于深度神经网络的 ATAC-seq 全基因组转录因子结合预测
PLoS Comput Biol. 2023 Jan 31;19(1):e1010863. doi: 10.1371/journal.pcbi.1010863. eCollection 2023 Jan.
4
Obtaining genetics insights from deep learning via explainable artificial intelligence.通过可解释人工智能从深度学习中获取遗传学见解。
Nat Rev Genet. 2023 Feb;24(2):125-137. doi: 10.1038/s41576-022-00532-2. Epub 2022 Oct 3.
5
Improving language model of human genome for DNA-protein binding prediction based on task-specific pre-training.基于特定任务预训练改进用于DNA-蛋白质结合预测的人类基因组语言模型。
Interdiscip Sci. 2023 Mar;15(1):32-43. doi: 10.1007/s12539-022-00537-9. Epub 2022 Sep 22.
6
Splice-site identification for exon prediction using bidirectional LSTM-RNN approach.使用双向长短期记忆循环神经网络(LSTM-RNN)方法进行外显子预测的剪接位点识别。
Biochem Biophys Rep. 2022 May 26;30:101285. doi: 10.1016/j.bbrep.2022.101285. eCollection 2022 Jul.
7
JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles.JASPAR 2022:转录因子结合谱开放获取数据库的第 9 个版本。
Nucleic Acids Res. 2022 Jan 7;50(D1):D165-D173. doi: 10.1093/nar/gkab1113.
8
Explainable deep neural networks for novel viral genome prediction.用于新型病毒基因组预测的可解释深度神经网络。
Appl Intell (Dordr). 2022;52(3):3002-3017. doi: 10.1007/s10489-021-02572-3. Epub 2021 Jun 25.
9
Fast and exact quantification of motif occurrences in biological sequences.快速准确地定量生物序列中的基序出现次数。
BMC Bioinformatics. 2021 Sep 18;22(1):445. doi: 10.1186/s12859-021-04355-6.
10
Explainability in transformer models for functional genomics.用于功能基因组学的转换器模型的可解释性。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab060.