Suppr超能文献

迁移学习可实现网络生物学预测。

Transfer learning enables predictions in network biology.

机构信息

Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.

Cardiovascular Disease Initiative and Precision Cardiology Laboratory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

出版信息

Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.

Abstract

Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding and computer vision by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.

摘要

基因网络的绘制需要大量转录组数据来了解基因之间的联系,但在数据有限的情况下,包括罕见病和临床不可及组织的疾病,这一过程受到阻碍。最近,通过利用在大规模通用数据集上预训练的深度学习模型,转移学习彻底改变了自然语言理解和计算机视觉等领域,这些模型可以通过使用有限的特定于任务的数据进行微调,针对各种下游任务。在这里,我们开发了一种基于上下文感知和注意力的深度学习模型 Geneformer,它在一个包含约 3000 万个单细胞转录组的大规模语料库上进行预训练,以便在网络生物学中数据有限的情况下实现特定于上下文的预测。在预训练期间,Geneformer 通过模型的注意力权重以完全自我监督的方式,对网络层次结构进行编码,从而对网络动态获得了基本的理解。使用有限的特定于任务的数据针对与染色质和网络动态相关的各种下游任务进行微调表明,Geneformer 始终提高了预测准确性。将其应用于具有有限患者数据的疾病建模,Geneformer 确定了心肌病的候选治疗靶点。总体而言,Geneformer 代表了一个经过预训练的深度学习模型,可针对广泛的下游应用进行微调,从而加速关键网络调节剂和候选治疗靶点的发现。

相似文献

1
Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。
Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.
10
MIGP: Metapath Integrated Graph Prompt Neural Network.MIGP:基于元路径集成图提示的神经网络。
Neural Netw. 2024 Nov;179:106595. doi: 10.1016/j.neunet.2024.106595. Epub 2024 Aug 2.

引用本文的文献

5
Multimodal integration strategies for clinical application in oncology.肿瘤学临床应用中的多模态整合策略
Front Pharmacol. 2025 Aug 20;16:1609079. doi: 10.3389/fphar.2025.1609079. eCollection 2025.

本文引用的文献

2
A cross-disorder dosage sensitivity map of the human genome.人类基因组的跨疾病剂量敏感性图谱。
Cell. 2022 Aug 4;185(16):3041-3055.e25. doi: 10.1016/j.cell.2022.06.036. Epub 2022 Aug 1.
10
Mitochondrial Function and Dysfunction in Dilated Cardiomyopathy.扩张型心肌病中的线粒体功能与功能障碍
Front Cell Dev Biol. 2021 Jan 12;8:624216. doi: 10.3389/fcell.2020.624216. eCollection 2020.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验