• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GET:一种跨人类细胞类型转录的基础模型。

GET: a foundation model of transcription across human cell types.

作者信息

Fu Xi, Mo Shentong, Buendia Alejandro, Laurent Anouchka, Shao Anqi, Del Mar Alvarez-Torres Maria, Yu Tianji, Tan Jimin, Su Jiayu, Sagatelian Romella, Ferrando Adolfo A, Ciccia Alberto, Lan Yanyan, Owens David M, Palomero Teresa, Xing Eric P, Rabadan Raul

机构信息

Department of Systems Biology, Columbia University, New York, NY, USA.

Department of Biomedical Informatics, Columbia University, New York, NY, USA.

出版信息

bioRxiv. 2024 Jul 3:2023.09.24.559168. doi: 10.1101/2023.09.24.559168.

DOI:10.1101/2023.09.24.559168
PMID:39005360
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11244937/
Abstract

Transcriptional regulation, involving the complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate in unseen cell types and conditions. Here, we introduce GET, an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types. Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously unseen cell types. GET showcases remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions, and uncovering universal and cell type specific transcription factor interaction networks. We evaluated its performance on prediction of regulatory activity, inference of regulatory elements and regulators, and identification of physical interactions between transcription factors. Specifically, we show GET outperforms current models in predicting lentivirus-based massive parallel reporter assay readout with reduced input data. In fetal erythroblasts, we identify distal (>1Mbp) regulatory regions that were missed by previous models. In B cells, we identified a lymphocyte-specific transcription factor-transcription factor interaction that explains the functional significance of a leukemia-risk predisposing germline mutation. In sum, we provide a generalizable and accurate model for transcription together with catalogs of gene regulation and transcription factor interactions, all with cell type specificity.

摘要

转录调控涉及调控序列与蛋白质之间复杂的相互作用,指导着所有生物过程。转录的计算模型缺乏可推广性,无法在未见过的细胞类型和条件下进行准确推断。在此,我们引入了GET,这是一个可解释的基础模型,旨在揭示213种人类胎儿和成人细胞类型中的调控语法。GET仅依靠染色质可及性数据和序列信息,即使在以前未见过的细胞类型中预测基因表达也能达到实验水平的准确性。GET在新的测序平台和检测方法中展现出显著的适应性,能够在广泛的细胞类型和条件下进行调控推断,并揭示通用的和细胞类型特异性的转录因子相互作用网络。我们评估了它在调控活性预测、调控元件和调控因子推断以及转录因子之间物理相互作用识别方面的性能。具体而言,我们表明GET在使用减少的输入数据预测基于慢病毒的大规模平行报告基因检测读数方面优于当前模型。在胎儿成红细胞中,我们识别出了先前模型遗漏的远端(>1Mbp)调控区域。在B细胞中,我们识别出了一种淋巴细胞特异性的转录因子 - 转录因子相互作用,这解释了一种白血病风险易感种系突变的功能意义。总之,我们提供了一个可推广且准确的转录模型以及基因调控和转录因子相互作用的目录,所有这些都具有细胞类型特异性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/3463172af26a/nihpp-2023.09.24.559168v2-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/6b4165459262/nihpp-2023.09.24.559168v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/3e55185bb510/nihpp-2023.09.24.559168v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/86e18a3a71cf/nihpp-2023.09.24.559168v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/f05a5da71aed/nihpp-2023.09.24.559168v2-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/a1370d5594f0/nihpp-2023.09.24.559168v2-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/3463172af26a/nihpp-2023.09.24.559168v2-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/6b4165459262/nihpp-2023.09.24.559168v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/3e55185bb510/nihpp-2023.09.24.559168v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/86e18a3a71cf/nihpp-2023.09.24.559168v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/f05a5da71aed/nihpp-2023.09.24.559168v2-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/a1370d5594f0/nihpp-2023.09.24.559168v2-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44fe/11244937/3463172af26a/nihpp-2023.09.24.559168v2-f0006.jpg

相似文献

1
GET: a foundation model of transcription across human cell types.GET:一种跨人类细胞类型转录的基础模型。
bioRxiv. 2024 Jul 3:2023.09.24.559168. doi: 10.1101/2023.09.24.559168.
2
A foundation model of transcription across human cell types.一种跨人类细胞类型的转录基础模型。
Nature. 2025 Jan;637(8047):965-973. doi: 10.1038/s41586-024-08391-z. Epub 2025 Jan 8.
3
Integrating Prior Knowledge Using Transformer for Gene Regulatory Network Inference.利用Transformer整合先验知识进行基因调控网络推断
Adv Sci (Weinh). 2025 Jan;12(3):e2409990. doi: 10.1002/advs.202409990. Epub 2024 Nov 28.
4
Modeling transcriptional regulation using gene regulatory networks based on multi-omics data sources.基于多组学数据资源的基因调控网络进行转录调控建模。
BMC Bioinformatics. 2021 Apr 19;22(1):200. doi: 10.1186/s12859-021-04126-3.
5
Regulatory chromatin landscape in roots uncovered by coupling INTACT and ATAC-seq.通过结合INTACT和ATAC-seq揭示的根中的调控染色质景观。
Plant Methods. 2018 Dec 20;14:113. doi: 10.1186/s13007-018-0381-9. eCollection 2018.
6
REUNION: transcription factor binding prediction and regulatory association inference from single-cell multi-omics data.REUNION:从单细胞多组学数据中进行转录因子结合预测和调控关联推断。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i567-i575. doi: 10.1093/bioinformatics/btae234.
7
A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity.一项系统比较揭示了增强子活性在染色体编码与附加体编码方面存在显著差异。
Genome Res. 2017 Jan;27(1):38-52. doi: 10.1101/gr.212092.116. Epub 2016 Nov 9.
8
Predicting cell-type-specific gene expression from regions of open chromatin.从开放染色质区域预测细胞类型特异性基因表达。
Genome Res. 2012 Sep;22(9):1711-22. doi: 10.1101/gr.135129.111.
9
MINI-AC: inference of plant gene regulatory networks using bulk or single-cell accessible chromatin profiles.MINI-AC:使用批量或单细胞可及染色质谱推断植物基因调控网络。
Plant J. 2024 Jan;117(1):280-301. doi: 10.1111/tpj.16483. Epub 2023 Oct 3.
10
A single-cell atlas of chromatin accessibility in the human genome.人类基因组中单细胞核染色质可及性图谱
Cell. 2021 Nov 24;184(24):5985-6001.e19. doi: 10.1016/j.cell.2021.10.024. Epub 2021 Nov 12.

本文引用的文献

1
Large-scale foundation model on single-cell transcriptomics.单细胞转录组学的大规模基础模型。
Nat Methods. 2024 Aug;21(8):1481-1491. doi: 10.1038/s41592-024-02305-7. Epub 2024 Jun 6.
2
scGPT: toward building a foundation model for single-cell multi-omics using generative AI.scGPT:迈向使用生成式人工智能构建单细胞多组学基础模型
Nat Methods. 2024 Aug;21(8):1470-1480. doi: 10.1038/s41592-024-02201-0. Epub 2024 Feb 26.
3
p300 is an obligate integrator of combinatorial transcription factor inputs.p300 是组合转录因子输入的必需整合因子。
Mol Cell. 2024 Jan 18;84(2):234-243.e4. doi: 10.1016/j.molcel.2023.12.004. Epub 2023 Dec 29.
4
Epigenetic regulation during cancer transitions across 11 tumour types.癌症在 11 种肿瘤类型中的转移过程中的表观遗传调控。
Nature. 2023 Nov;623(7986):432-441. doi: 10.1038/s41586-023-06682-5. Epub 2023 Nov 1.
5
Chromatin accessibility in the Drosophila embryo is determined by transcription factor pioneering and enhancer activation.果蝇胚胎中的染色质可及性由转录因子的开拓和增强子的激活决定。
Dev Cell. 2023 Oct 9;58(19):1898-1916.e9. doi: 10.1016/j.devcel.2023.07.007. Epub 2023 Aug 8.
6
Dictys: dynamic gene regulatory network dissects developmental continuum with single-cell multiomics.迪克斯:动态基因调控网络以单细胞多组学解析发育连续性。
Nat Methods. 2023 Sep;20(9):1368-1378. doi: 10.1038/s41592-023-01971-3. Epub 2023 Aug 3.
7
Regulation of the RNA polymerase II pre-initiation complex by its associated coactivators.RNA聚合酶II预起始复合物受其相关共激活因子的调控。
Nat Rev Genet. 2023 Nov;24(11):767-782. doi: 10.1038/s41576-023-00630-9. Epub 2023 Aug 2.
8
SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks.SCENIC+:单细胞多组学推断增强子和基因调控网络。
Nat Methods. 2023 Sep;20(9):1355-1367. doi: 10.1038/s41592-023-01938-4. Epub 2023 Jul 13.
9
Structural basis for specific DNA sequence motif recognition by the TFAP2 transcription factors.TFAP2 转录因子特异性 DNA 序列基序识别的结构基础。
Nucleic Acids Res. 2023 Aug 25;51(15):8270-8282. doi: 10.1093/nar/gkad583.
10
Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。
Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.