迁移学习可实现网络生物学预测。

Transfer learning enables predictions in network biology.

机构信息

Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.

Cardiovascular Disease Initiative and Precision Cardiology Laboratory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

出版信息

Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.

DOI:10.1038/s41586-023-06139-9

PMID:37258680

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10949956/

Abstract

Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding and computer vision by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.

摘要

基因网络的绘制需要大量转录组数据来了解基因之间的联系，但在数据有限的情况下，包括罕见病和临床不可及组织的疾病，这一过程受到阻碍。最近，通过利用在大规模通用数据集上预训练的深度学习模型，转移学习彻底改变了自然语言理解和计算机视觉等领域，这些模型可以通过使用有限的特定于任务的数据进行微调，针对各种下游任务。在这里，我们开发了一种基于上下文感知和注意力的深度学习模型 Geneformer，它在一个包含约 3000 万个单细胞转录组的大规模语料库上进行预训练，以便在网络生物学中数据有限的情况下实现特定于上下文的预测。在预训练期间，Geneformer 通过模型的注意力权重以完全自我监督的方式，对网络层次结构进行编码，从而对网络动态获得了基本的理解。使用有限的特定于任务的数据针对与染色质和网络动态相关的各种下游任务进行微调表明，Geneformer 始终提高了预测准确性。将其应用于具有有限患者数据的疾病建模，Geneformer 确定了心肌病的候选治疗靶点。总体而言，Geneformer 代表了一个经过预训练的深度学习模型，可针对广泛的下游应用进行微调，从而加速关键网络调节剂和候选治疗靶点的发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60a6/10949956/cba77da728bd/nihms-1971443-f0007.jpg

相似文献

Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。

Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.

ChampKit: A framework for rapid evaluation of deep neural networks for patch-based histopathology classification.ChampKit：一种基于补丁的组织病理学分类的深度神经网络快速评估框架。

Comput Methods Programs Biomed. 2023 Sep;239:107631. doi: 10.1016/j.cmpb.2023.107631. Epub 2023 May 30.

Quantized multi-task learning for context-specific representations of gene network dynamics.用于基因网络动力学特定上下文表示的量化多任务学习。

bioRxiv. 2024 Aug 19:2024.08.16.608180. doi: 10.1101/2024.08.16.608180.

Leveraging auxiliary measures: a deep multi-task neural network for predictive modeling in clinical research.利用辅助措施：用于临床研究预测建模的深度多任务神经网络。

BMC Med Inform Decis Mak. 2018 Dec 12;18(Suppl 4):126. doi: 10.1186/s12911-018-0676-9.

Improving prediction for medical institution with limited patient data: Leveraging hospital-specific data based on multicenter collaborative research network.利用多中心协作研究网络的医院特有数据提高有限患者数据医疗机构的预测能力。

Artif Intell Med. 2021 Mar;113:102024. doi: 10.1016/j.artmed.2021.102024. Epub 2021 Jan 23.

scEMB: Learning context representation of genes based on large-scale single-cell transcriptomics.scEMB：基于大规模单细胞转录组学学习基因的上下文表示。

bioRxiv. 2024 Sep 26:2024.09.24.614685. doi: 10.1101/2024.09.24.614685.

Enhancing molecular property prediction with auxiliary learning and task-specific adaptation.通过辅助学习和特定任务适应增强分子性质预测。

J Cheminform. 2024 Jul 24;16(1):85. doi: 10.1186/s13321-024-00880-7.

A transfer learning approach to few-shot segmentation of novel white matter tracts.一种基于迁移学习的新白质束少样本分割方法。

Med Image Anal. 2022 Jul;79:102454. doi: 10.1016/j.media.2022.102454. Epub 2022 Apr 12.

Self-supervised-RCNN for medical image segmentation with limited data annotation.用于医学图像分割的具有有限数据标注的自监督区域卷积神经网络

Comput Med Imaging Graph. 2023 Oct;109:102297. doi: 10.1016/j.compmedimag.2023.102297. Epub 2023 Sep 9.

MIGP: Metapath Integrated Graph Prompt Neural Network.MIGP：基于元路径集成图提示的神经网络。

Neural Netw. 2024 Nov;179:106595. doi: 10.1016/j.neunet.2024.106595. Epub 2024 Aug 2.

引用本文的文献

Integrated single-cell atlas of human atherosclerotic plaques.人类动脉粥样硬化斑块的综合单细胞图谱。

Nat Commun. 2025 Sep 10;16(1):8255. doi: 10.1038/s41467-025-63202-x.

Scvi-hub: an actionable repository for model-driven single-cell analysis.Scvi-hub：一个用于模型驱动单细胞分析的可操作资源库。

Nat Methods. 2025 Sep 8. doi: 10.1038/s41592-025-02799-9.

Artificial Intelligence and Network Medicine: Path to Precision Medicine.人工智能与网络医学：通往精准医学之路

NEJM AI. 2025 Sep;2(9). doi: 10.1056/aira2401229. Epub 2025 Aug 28.

From large language models to multimodal AI: a scoping review on the potential of generative AI in medicine.从大语言模型到多模态人工智能：关于生成式人工智能在医学领域潜力的范围综述

Biomed Eng Lett. 2025 Aug 22;15(5):845-863. doi: 10.1007/s13534-025-00497-1. eCollection 2025 Sep.

Multimodal integration strategies for clinical application in oncology.肿瘤学临床应用中的多模态整合策略

Front Pharmacol. 2025 Aug 20;16:1609079. doi: 10.3389/fphar.2025.1609079. eCollection 2025.

TissueFormer: a neural network for labeling tissue from grouped single-cell RNA profiles.组织生成器：一种用于从分组单细胞RNA图谱标记组织的神经网络。

bioRxiv. 2025 Aug 19:2025.08.17.670735. doi: 10.1101/2025.08.17.670735.

Pre-training Genomic Language Model with Variants for Better Modeling Functional Genomics.使用变异体预训练基因组语言模型以更好地建模功能基因组学。

bioRxiv. 2025 Aug 23:2025.02.26.640468. doi: 10.1101/2025.02.26.640468.

scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis.scELMo：来自语言模型的嵌入是单细胞数据分析的优秀学习者。

bioRxiv. 2025 Aug 23:2023.12.07.569910. doi: 10.1101/2023.12.07.569910.

Optimizing blood-brain barrier permeability in KRAS inhibitors: A structure-constrained molecular generation approach.优化KRAS抑制剂的血脑屏障通透性：一种结构受限的分子生成方法。

J Pharm Anal. 2025 Aug;15(8):101337. doi: 10.1016/j.jpha.2025.101337. Epub 2025 May 9.

Dissecting cross-lineage tumourigenesis under p53 inactivation through single-cell multi-omics and spatial transcriptomics.通过单细胞多组学和空间转录组学剖析p53失活状态下的跨谱系肿瘤发生

Clin Transl Med. 2025 Sep;15(9):e70461. doi: 10.1002/ctm2.70461.

本文引用的文献

Single-cell map of diverse immune phenotypes in the metastatic brain tumor microenvironment of nonsmall-cell lung cancer.非小细胞肺癌转移性脑肿瘤微环境中多种免疫表型的单细胞图谱

Int J Surg. 2025 Jan 1;111(1):1601-1606. doi: 10.1097/JS9.0000000000002088.

A cross-disorder dosage sensitivity map of the human genome.人类基因组的跨疾病剂量敏感性图谱。

Cell. 2022 Aug 4;185(16):3041-3055.e25. doi: 10.1016/j.cell.2022.06.036. Epub 2022 Aug 1.

Single-nucleus profiling of human dilated and hypertrophic cardiomyopathy.人类扩张型和肥厚型心肌病的单细胞分析。

Nature. 2022 Aug;608(7921):174-180. doi: 10.1038/s41586-022-04817-8. Epub 2022 Jun 22.

iMyoblasts for ex vivo and in vivo investigations of human myogenesis and disease modeling.用于体外和体内研究人类肌肉发生和疾病建模的 iMyoblasts。

Elife. 2022 Jan 25;11:e70341. doi: 10.7554/eLife.70341.

Single-Cell RNA Sequencing in Multiple Pathologic Types of Renal Cell Carcinoma Revealed Novel Potential Tumor-Specific Markers.多种病理类型肾细胞癌的单细胞RNA测序揭示了新的潜在肿瘤特异性标志物。

Front Oncol. 2021 Oct 14;11:719564. doi: 10.3389/fonc.2021.719564. eCollection 2021.

scDeepSort: a pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network.scDeepSort：一种使用深度学习和加权图神经网络进行单细胞转录组学的预训练细胞类型注释方法。

Nucleic Acids Res. 2021 Dec 2;49(21):e122. doi: 10.1093/nar/gkab775.

TEAD family transcription factors in development and disease.TEAD 家族转录因子在发育和疾病中的作用。

Development. 2021 Jun 15;148(12). doi: 10.1242/dev.196675.

B cell signatures and tertiary lymphoid structures contribute to outcome in head and neck squamous cell carcinoma.B 细胞特征和三级淋巴结构有助于头颈部鳞状细胞癌的预后。

Nat Commun. 2021 Jun 7;12(1):3349. doi: 10.1038/s41467-021-23355-x.

COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas.大规模单细胞转录组图谱揭示的 COVID-19 免疫特征。

Cell. 2021 Apr 1;184(7):1895-1913.e19. doi: 10.1016/j.cell.2021.01.053. Epub 2021 Feb 3.

Mitochondrial Function and Dysfunction in Dilated Cardiomyopathy.扩张型心肌病中的线粒体功能与功能障碍

Front Cell Dev Biol. 2021 Jan 12;8:624216. doi: 10.3389/fcell.2020.624216. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

迁移学习可实现网络生物学预测。

Transfer learning enables predictions in network biology.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献