• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

小鼠基因Transformer:一种用于小鼠单细胞转录组的深度学习模型及其跨物种效用。

Mouse-Geneformer: A deep learning model for mouse single-cell transcriptome and its cross-species utility.

作者信息

Ito Keita, Hirakawa Tsubasa, Shigenobu Shuji, Fujiyoshi Hironobu, Yamashita Takayoshi

机构信息

Graduate School of Engineering, Chubu University, Kasugai, Aichi, Japan.

Department of Artificial Intelligence and Robotics, Center for Mathematical Science and Artificial Intelligence, Chubu University, Kasugai, Aichi, Japan.

出版信息

PLoS Genet. 2025 Mar 19;21(3):e1011420. doi: 10.1371/journal.pgen.1011420. eCollection 2025 Mar.

DOI:10.1371/journal.pgen.1011420
PMID:40106407
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11964219/
Abstract

Deep learning techniques are increasingly utilized to analyze large-scale single-cell RNA sequencing (scRNA-seq) data, offering valuable insights from complex transcriptome datasets. Geneformer, a pre-trained model using a Transformer Encoder architecture and human scRNA-seq datasets, has demonstrated remarkable success in human transcriptome analysis. However, given the prominence of the mouse, Mus musculus, as a primary mammalian model in biological and medical research, there is an acute need for a mouse-specific version of Geneformer. In this study, we developed a mouse-specific Geneformer (mouse-Geneformer) by constructing a large transcriptome dataset consisting of 21 million mouse scRNA-seq profiles and pre-training Geneformer on this dataset. The mouse-Geneformer effectively models the mouse transcriptome and, upon fine-tuning for downstream tasks, enhances the accuracy of cell type classification. In silico perturbation experiments using mouse-Geneformer successfully identified disease-causing genes that have been validated in in vivo experiments. These results demonstrate the feasibility of analyzing mouse data with mouse-Geneformer and highlight the robustness of the Geneformer architecture, applicable to any species with large-scale transcriptome data available. Furthermore, we found that mouse-Geneformer can analyze human transcriptome data in a cross-species manner. After the ortholog-based gene name conversion, the analysis of human scRNA-seq data using mouse-Geneformer, followed by fine-tuning with human data, achieved cell type classification accuracy comparable to that obtained using the original human Geneformer. In in silico simulation experiments using human disease models, we obtained results similar to human-Geneformer for the myocardial infarction model but only partially consistent results for the COVID-19 model, a trait unique to humans (laboratory mice are not susceptible to SARS-CoV-2). These findings suggest the potential for cross-species application of the Geneformer model while emphasizing the importance of species-specific models for capturing the full complexity of disease mechanisms. Despite the existence of the original Geneformer tailored for humans, human research could benefit from mouse-Geneformer due to its inclusion of samples that are ethically or technically inaccessible for humans, such as embryonic tissues and certain disease models. Additionally, this cross-species approach indicates potential use for non-model organisms, where obtaining large-scale single-cell transcriptome data is challenging.

摘要

深度学习技术越来越多地被用于分析大规模单细胞RNA测序(scRNA-seq)数据,从复杂的转录组数据集中提供有价值的见解。Geneformer是一种使用Transformer编码器架构和人类scRNA-seq数据集进行预训练的模型,在人类转录组分析中取得了显著成功。然而,鉴于小家鼠作为生物学和医学研究中的主要哺乳动物模型的突出地位,迫切需要一个小鼠特异性版本的Geneformer。在本研究中,我们通过构建一个由2100万个小鼠scRNA-seq图谱组成的大型转录组数据集,并在此数据集上对Geneformer进行预训练,开发了一种小鼠特异性的Geneformer(小鼠-Geneformer)。小鼠-Geneformer有效地对小鼠转录组进行建模,并在针对下游任务进行微调后,提高了细胞类型分类的准确性。使用小鼠-Geneformer进行的计算机模拟扰动实验成功地鉴定出了在体内实验中得到验证的致病基因。这些结果证明了使用小鼠-Geneformer分析小鼠数据的可行性,并突出了Geneformer架构的稳健性,适用于任何有大规模转录组数据的物种。此外,我们发现小鼠-Geneformer可以跨物种方式分析人类转录组数据。在基于直系同源基因的基因名称转换后,使用小鼠-Geneformer分析人类scRNA-seq数据,然后用人数据进行微调,获得的细胞类型分类准确性与使用原始人类Geneformer相当。在使用人类疾病模型的计算机模拟实验中,我们在心肌梗死模型中获得了与人类-Geneformer相似的结果,但在COVID-19模型中仅获得了部分一致的结果,这是人类特有的特征(实验室小鼠对SARS-CoV-2不敏感)。这些发现表明了Geneformer模型跨物种应用的潜力,同时强调了物种特异性模型对于捕捉疾病机制的全部复杂性的重要性。尽管存在为人类量身定制的原始Geneformer,但人类研究仍可从小鼠-Geneformer中受益,因为它包含了人类在伦理或技术上无法获取的样本,如胚胎组织和某些疾病模型。此外,这种跨物种方法表明了对非模式生物的潜在用途,在这些生物中获取大规模单细胞转录组数据具有挑战性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7185/11964219/fc0b91406c2b/pgen.1011420.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7185/11964219/109b4b6f6848/pgen.1011420.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7185/11964219/886c107b8f74/pgen.1011420.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7185/11964219/054a705a743f/pgen.1011420.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7185/11964219/60a334377ed1/pgen.1011420.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7185/11964219/fc0b91406c2b/pgen.1011420.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7185/11964219/109b4b6f6848/pgen.1011420.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7185/11964219/886c107b8f74/pgen.1011420.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7185/11964219/054a705a743f/pgen.1011420.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7185/11964219/60a334377ed1/pgen.1011420.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7185/11964219/fc0b91406c2b/pgen.1011420.g005.jpg

相似文献

1
Mouse-Geneformer: A deep learning model for mouse single-cell transcriptome and its cross-species utility.小鼠基因Transformer:一种用于小鼠单细胞转录组的深度学习模型及其跨物种效用。
PLoS Genet. 2025 Mar 19;21(3):e1011420. doi: 10.1371/journal.pgen.1011420. eCollection 2025 Mar.
2
Enhancing single-cell classification accuracy using image conversion and deep learning.利用图像转换和深度学习提高单细胞分类准确性。
Yi Chuan. 2025 Mar;47(3):382-392. doi: 10.16288/j.yczz.24-213.
3
scDTL: enhancing single-cell RNA-seq imputation through deep transfer learning with bulk cell information.scDTL:通过利用批量细胞信息进行深度迁移学习增强单细胞 RNA-seq 推断。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae555.
4
Graph contrastive learning as a versatile foundation for advanced scRNA-seq data analysis.图对比学习作为高级 scRNA-seq 数据分析的多功能基础。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae558.
5
A hybrid deep clustering approach for robust cell type profiling using single-cell RNA-seq data.基于单细胞 RNA-seq 数据的混合深度聚类方法进行稳健的细胞类型分析。
RNA. 2020 Oct;26(10):1303-1319. doi: 10.1261/rna.074427.119. Epub 2020 Jun 12.
6
Explainable deep neural networks for predicting sample phenotypes from single-cell transcriptomics.用于从单细胞转录组学预测样本表型的可解释深度神经网络。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae673.
7
scDeepSort: a pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network.scDeepSort:一种使用深度学习和加权图神经网络进行单细胞转录组学的预训练细胞类型注释方法。
Nucleic Acids Res. 2021 Dec 2;49(21):e122. doi: 10.1093/nar/gkab775.
8
SpaDiT: diffusion transformer for spatial gene expression prediction using scRNA-seq.SpaDiT:基于 scRNA-seq 的空间基因表达预测扩散转换器。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae571.
9
XgCPred: Cell type classification using XGBoost-CNN integration and exploiting gene expression imaging in single-cell RNAseq data.XgCPred:基于 XGBoost-CNN 集成和单细胞 RNAseq 数据中基因表达成像的细胞类型分类。
Comput Biol Med. 2024 Oct;181:109066. doi: 10.1016/j.compbiomed.2024.109066. Epub 2024 Aug 24.
10
Continually adapting pre-trained language model to universal annotation of single-cell RNA-seq data.持续调整预先训练的语言模型,以实现单细胞 RNA-seq 数据的通用注释。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae047.

引用本文的文献

1
TissueFormer: a neural network for labeling tissue from grouped single-cell RNA profiles.组织生成器:一种用于从分组单细胞RNA图谱标记组织的神经网络。
bioRxiv. 2025 Aug 19:2025.08.17.670735. doi: 10.1101/2025.08.17.670735.

本文引用的文献

1
scGPT: toward building a foundation model for single-cell multi-omics using generative AI.scGPT:迈向使用生成式人工智能构建单细胞多组学基础模型
Nat Methods. 2024 Aug;21(8):1470-1480. doi: 10.1038/s41592-024-02201-0. Epub 2024 Feb 26.
2
Attention-based deep clustering method for scRNA-seq cell type identification.基于注意力机制的深度聚类方法在 scRNA-seq 细胞类型鉴定中的应用。
PLoS Comput Biol. 2023 Nov 10;19(11):e1011641. doi: 10.1371/journal.pcbi.1011641. eCollection 2023 Nov.
3
Transcriptomic diversity of cell types across the adult human brain.
成人脑中细胞类型的转录组多样性。
Science. 2023 Oct 13;382(6667):eadd7046. doi: 10.1126/science.add7046.
4
scDeepInsight: a supervised cell-type identification method for scRNA-seq data with deep learning.scDeepInsight:一种基于深度学习的 scRNA-seq 数据有监督细胞类型识别方法。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad266.
5
Spatiotemporal transcriptomic maps of whole mouse embryos at the onset of organogenesis.器官发生起始时的全鼠胚胎时空转录组图谱。
Nat Genet. 2023 Jul;55(7):1176-1185. doi: 10.1038/s41588-023-01435-6. Epub 2023 Jul 6.
6
A spatially resolved single-cell genomic atlas of the adult human breast.成人乳腺的空间分辨单细胞基因组图谱。
Nature. 2023 Aug;620(7972):181-191. doi: 10.1038/s41586-023-06252-9. Epub 2023 Jun 28.
7
Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。
Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.
8
Spatial multi-omic map of human myocardial infarction.人类心肌梗死的空间多组学图谱。
Nature. 2022 Aug;608(7924):766-777. doi: 10.1038/s41586-022-05060-x. Epub 2022 Aug 10.
9
High-resolution Slide-seqV2 spatial transcriptomics enables discovery of disease-specific cell neighborhoods and pathways.高分辨率Slide-seqV2空间转录组学有助于发现疾病特异性细胞邻域和通路。
iScience. 2022 Mar 16;25(4):104097. doi: 10.1016/j.isci.2022.104097. eCollection 2022 Apr 15.
10
Expression Atlas update: gene and protein expression in multiple species.ExpressionAtlas 更新:多种物种中的基因和蛋白质表达。
Nucleic Acids Res. 2022 Jan 7;50(D1):D129-D140. doi: 10.1093/nar/gkab1030.