• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用多输出高斯过程学习单细胞多组学数据的可解释表示。

Learning interpretable representations of single-cell multi-omics data with multi-output Gaussian processes.

作者信息

Moslehi Zahra, AmeriFar Sareh, de Azevedo Kevin, Buettner Florian

机构信息

German Cancer Consortium (DKTK), partner site Frankfurt/Mainz, a partnership between DKFZ and UCT Frankfurt-Marburg, 60590 Frankfurt am Main, Germany.

German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.

出版信息

Nucleic Acids Res. 2025 Jul 19;53(14). doi: 10.1093/nar/gkaf630.

DOI:10.1093/nar/gkaf630
PMID:40694853
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12282953/
Abstract

Learning representations of single-cell genomics data is challenging due to the nonlinear and often multi-modal nature of the data on one hand and the need for interpretable representations on the other hand. Existing approaches tend to focus either on interpretability aspects via linear matrix factorization or on maximizing expressive power via neural network-based embeddings using black-box variational autoencoders or graph embedding approaches. We address this trade-off between expressive power and interpretability by introducing a novel approach that combines highly expressive representation learning via an embedding layer with interpretable multi-output Gaussian processes within a unified framework. In our model, we learn distinct representations for samples (cells) and features (genes) from multi-modal single-cell data. We demonstrate that even a few interpretable latent dimensions can effectively capture the underlying structure of the data. Our model yields interpretable relationships between groups of cells and their associated marker genes: leveraging a gene relevance map, we establish connections between cell clusters (e.g. specific cell types) and feature clusters (e.g. marker genes for those specific cell types) within the learned latent spaces of cells and features.

摘要

单细胞基因组学数据的表示学习具有挑战性,一方面是由于数据的非线性且通常是多模态性质,另一方面是需要可解释的表示。现有方法往往要么侧重于通过线性矩阵分解实现可解释性,要么侧重于使用黑箱变分自编码器或图嵌入方法通过基于神经网络的嵌入来最大化表达能力。我们通过引入一种新颖的方法来解决表达能力和可解释性之间的这种权衡,该方法在统一框架内将通过嵌入层进行的高表达表示学习与可解释的多输出高斯过程相结合。在我们的模型中,我们从多模态单细胞数据中学习样本(细胞)和特征(基因)的不同表示。我们证明,即使是几个可解释的潜在维度也可以有效地捕获数据的底层结构。我们的模型在细胞组与其相关标记基因之间产生可解释的关系:利用基因相关性图,我们在细胞和特征的学习潜在空间内建立细胞簇(例如特定细胞类型)和特征簇(例如那些特定细胞类型的标记基因)之间的联系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/56a75d56f786/gkaf630fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/d4a787043e6d/gkaf630figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/cc61dff92142/gkaf630fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/1f84027d52ce/gkaf630fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/8f2bd065714c/gkaf630fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/ca007ed732ae/gkaf630fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/7aca3cf15348/gkaf630fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/416d31452158/gkaf630fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/4bb32e57db50/gkaf630fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/63fe2730046b/gkaf630fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/3498f5e7f554/gkaf630fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/56a75d56f786/gkaf630fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/d4a787043e6d/gkaf630figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/cc61dff92142/gkaf630fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/1f84027d52ce/gkaf630fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/8f2bd065714c/gkaf630fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/ca007ed732ae/gkaf630fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/7aca3cf15348/gkaf630fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/416d31452158/gkaf630fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/4bb32e57db50/gkaf630fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/63fe2730046b/gkaf630fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/3498f5e7f554/gkaf630fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9318/12282953/56a75d56f786/gkaf630fig10.jpg

相似文献

1
Learning interpretable representations of single-cell multi-omics data with multi-output Gaussian processes.使用多输出高斯过程学习单细胞多组学数据的可解释表示。
Nucleic Acids Res. 2025 Jul 19;53(14). doi: 10.1093/nar/gkaf630.
2
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
3
A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.用于评估、选择和解释2型糖尿病患者心血管疾病结局机器学习模型的责任框架:方法与验证研究
JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.
4
Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping.新型多组学去混淆变分自动编码器可获得有意义的疾病亚型。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae512.
5
Short-Term Memory Impairment短期记忆障碍
6
MO-GCAN: multi-omics integration based on graph convolutional and attention networks.MO-GCAN:基于图卷积和注意力网络的多组学整合
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf405.
7
Variation within and between digital pathology and light microscopy for the diagnosis of histopathology slides: blinded crossover comparison study.数字病理学与光学显微镜检查在组织病理学切片诊断中的内部及相互间差异:双盲交叉对比研究
Health Technol Assess. 2025 Jul;29(30):1-75. doi: 10.3310/SPLK4325.
8
Predictive modeling of complications arising from early-onset preeclampsia in pregnant women.早发型子痫前期孕妇并发症的预测模型
Womens Health (Lond). 2025 Jan-Dec;21:17455057251348978. doi: 10.1177/17455057251348978. Epub 2025 Jul 21.
9
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
10
Multi-view representation learning for tabular data integration using inter-feature relationships.使用特征间关系进行表格数据集成的多视图表示学习。
J Biomed Inform. 2024 Mar;151:104602. doi: 10.1016/j.jbi.2024.104602. Epub 2024 Feb 10.

本文引用的文献

1
Chromatin accessibility profiling methods.染色质可及性分析方法。
Nat Rev Methods Primers. 2021;1. doi: 10.1038/s43586-020-00008-9. Epub 2021 Jan 21.
2
Slide-tags enables single-nucleus barcoding for multimodal spatial genomics.幻灯片标签可实现多模式空间基因组学的单细胞核条形码技术。
Nature. 2024 Jan;625(7993):101-109. doi: 10.1038/s41586-023-06837-4. Epub 2023 Dec 13.
3
SIMBA: single-cell embedding along with features.SIMBA:单细胞特征嵌入。
Nat Methods. 2024 Jun;21(6):1003-1013. doi: 10.1038/s41592-023-01899-8. Epub 2023 May 29.
4
SEACells infers transcriptional and epigenomic cellular states from single-cell genomics data.SEACells 从单细胞基因组学数据推断转录和表观基因组细胞状态。
Nat Biotechnol. 2023 Dec;41(12):1746-1757. doi: 10.1038/s41587-023-01716-9. Epub 2023 Mar 27.
5
GSEApy: a comprehensive package for performing gene set enrichment analysis in Python.GSEApy:一个用于在 Python 中进行基因集富集分析的综合软件包。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac757.
6
Normalizing and denoising protein expression data from droplet-based single cell profiling.基于液滴的单细胞分析的蛋白质表达数据的标准化和去噪。
Nat Commun. 2022 Apr 19;13(1):2099. doi: 10.1038/s41467-022-29356-8.
7
A Python library for probabilistic analysis of single-cell omics data.一个用于单细胞组学数据概率分析的Python库。
Nat Biotechnol. 2022 Feb;40(2):163-166. doi: 10.1038/s41587-021-01206-w.
8
Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data.单细胞 RNA-seq 数据的低维嵌入中相关基因的自动识别。
Bioinformatics. 2020 Aug 1;36(15):4291-4295. doi: 10.1093/bioinformatics/btaa198.
9
Integrating single-cell transcriptomic data across different conditions, technologies, and species.整合不同条件、技术和物种的单细胞转录组数据。
Nat Biotechnol. 2018 Jun;36(5):411-420. doi: 10.1038/nbt.4096. Epub 2018 Apr 2.
10
SCANPY: large-scale single-cell gene expression data analysis.SCANPY:大规模单细胞基因表达数据分析。
Genome Biol. 2018 Feb 6;19(1):15. doi: 10.1186/s13059-017-1382-0.