基于单细胞测序数据训练的可解释自动编码器可直接迁移到未见组织的数据。

Interpretable Autoencoders Trained on Single Cell Sequencing Data Can Transfer Directly to Data from Unseen Tissues.

机构信息

Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Blegdamsvej 9, 2100 Copenhagen, Denmark.

Center for Surgical Science, Zealand University Hospital, Lykkebækvej 1, 4600 Koege, Denmark.

出版信息

Cells. 2021 Dec 28;11(1):85. doi: 10.3390/cells11010085.

DOI:10.3390/cells11010085

PMID:35011647

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8750521/

Abstract

Autoencoders have been used to model single-cell mRNA-sequencing data with the purpose of denoising, visualization, data simulation, and dimensionality reduction. We, and others, have shown that autoencoders can be explainable models and interpreted in terms of biology. Here, we show that such autoencoders can generalize to the extent that they can transfer directly without additional training. In practice, we can extract biological modules, denoise, and classify data correctly from an autoencoder that was trained on a different dataset and with different cells (a foreign model). We deconvoluted the biological signal encoded in the bottleneck layer of scRNA-models using saliency maps and mapped salient features to biological pathways. Biological concepts could be associated with specific nodes and interpreted in relation to biological pathways. Even in this unsupervised framework, with no prior information about cell types or labels, the specific biological pathways deduced from the model were in line with findings in previous research. It was hypothesized that autoencoders could learn and represent meaningful biology; here, we show with a systematic experiment that this is true and even transcends the training data. This means that carefully trained autoencoders can be used to assist the interpretation of new unseen data.

摘要

自编码器已被用于对单细胞 mRNA 测序数据进行建模，目的是降噪、可视化、数据模拟和降维。我们和其他人已经表明，自编码器可以是可解释的模型，并可以从生物学的角度进行解释。在这里，我们表明，这种自编码器可以进行泛化，以至于它们可以直接转移而无需额外的训练。在实践中，我们可以从一个在不同数据集和不同细胞（外国模型）上训练的自编码器中提取生物学模块、降噪并正确分类数据。我们使用显着性图对 scRNA 模型的瓶颈层中编码的生物学信号进行去卷积，并将显着特征映射到生物学途径。可以将生物学概念与特定节点相关联，并根据生物学途径进行解释。即使在这个没有关于细胞类型或标签的先验信息的无监督框架中，从模型推断出的特定生物学途径也与之前的研究结果一致。假设自编码器可以学习和表示有意义的生物学；在这里，我们通过系统的实验表明这是正确的，甚至超越了训练数据。这意味着经过精心训练的自编码器可以用于协助解释新的未见数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81bd/8750521/834bb8155550/cells-11-00085-g001.jpg

相似文献

Interpretable Autoencoders Trained on Single Cell Sequencing Data Can Transfer Directly to Data from Unseen Tissues.基于单细胞测序数据训练的可解释自动编码器可直接迁移到未见组织的数据。

Cells. 2021 Dec 28;11(1):85. doi: 10.3390/cells11010085.

Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data.从单细胞 mRNA 测序数据中反卷积自动编码器以学习生物调节模块。

BMC Bioinformatics. 2019 Jul 8;20(1):379. doi: 10.1186/s12859-019-2952-9.

Semisupervised learning using denoising autoencoders for brain lesion detection and segmentation.使用去噪自动编码器进行半监督学习以检测和分割脑损伤

J Med Imaging (Bellingham). 2017 Oct;4(4):041311. doi: 10.1117/1.JMI.4.4.041311. Epub 2017 Dec 14.

ARAE: Adversarially robust training of autoencoders improves novelty detection.对抗鲁棒训练自编码器可提高新颖性检测。

Neural Netw. 2021 Dec;144:726-736. doi: 10.1016/j.neunet.2021.09.014. Epub 2021 Sep 28.

Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders.使用去噪自编码器从乳腺癌全基因组检测中进行无监督特征构建和知识提取。

Pac Symp Biocomput. 2015;20:132-43.

Enhanced prediction of recombination hotspots using input features extracted by class specific autoencoders.使用特定类别自动编码器提取的输入特征增强重组热点预测。

J Theor Biol. 2018 May 7;444:73-82. doi: 10.1016/j.jtbi.2018.02.016. Epub 2018 Feb 17.

Denoising Adversarial Autoencoders.去噪对抗自编码器

IEEE Trans Neural Netw Learn Syst. 2019 Apr;30(4):968-984. doi: 10.1109/TNNLS.2018.2852738. Epub 2018 Aug 16.

BRAIN LESION DETECTION USING A ROBUST VARIATIONAL AUTOENCODER AND TRANSFER LEARNING.使用鲁棒变分自编码器和迁移学习进行脑损伤检测

Proc IEEE Int Symp Biomed Imaging. 2020 Apr;2020:786-790. doi: 10.1109/isbi45749.2020.9098405. Epub 2020 May 22.

A connection between score matching and denoising autoencoders.得分匹配与去噪自动编码器之间的联系。

Neural Comput. 2011 Jul;23(7):1661-74. doi: 10.1162/NECO_a_00142. Epub 2011 Apr 14.

Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations.使用多个潜在空间维度压缩基因表达数据可学习互补的生物学表现形式。

Genome Biol. 2020 May 11;21(1):109. doi: 10.1186/s13059-020-02021-3.

引用本文的文献

Interpretable deep learning in single-cell omics.单细胞组学中的可解释深度学习。

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae374.

Single-Cell Analysis 2.0.单细胞分析 2.0.

Cells. 2022 Dec 30;12(1):154. doi: 10.3390/cells12010154.

Cutting-Edge Methods for Better Understanding Cells.用于更好地理解细胞的尖端方法。

Cells. 2022 Nov 3;11(21):3479. doi: 10.3390/cells11213479.

本文引用的文献

Deep Learning Enables Fast and Accurate Imputation of Gene Expression.深度学习助力基因表达的快速准确插补。

Front Genet. 2021 Apr 13;12:624128. doi: 10.3389/fgene.2021.624128. eCollection 2021.

Conditional out-of-distribution generation for unpaired data using transfer VAE.基于迁移 VAE 的无配对数据条件离分布生成。

Bioinformatics. 2020 Dec 30;36(Suppl_2):i610-i617. doi: 10.1093/bioinformatics/btaa800.

Enhancing scientific discoveries in molecular biology with deep generative models.利用深度生成模型增强分子生物学中的科学发现。

Mol Syst Biol. 2020 Sep;16(9):e9198. doi: 10.15252/msb.20199198.

A systematic evaluation of single-cell RNA-sequencing imputation methods.单细胞 RNA-seq 数据插补方法的系统评价

Genome Biol. 2020 Aug 27;21(1):218. doi: 10.1186/s13059-020-02132-x.

Deep learning-based cell composition analysis from tissue expression profiles.基于深度学习的组织表达谱细胞成分分析

Sci Adv. 2020 Jul 22;6(30):eaba2619. doi: 10.1126/sciadv.aba2619. eCollection 2020 Jul.

Unsupervised generative and graph representation learning for modelling cell differentiation.无监督生成和图表示学习在细胞分化建模中的应用。

Sci Rep. 2020 Jun 17;10(1):9790. doi: 10.1038/s41598-020-66166-8.

scVAE: variational auto-encoders for single-cell gene expression data.scVAE：用于单细胞基因表达数据的变分自动编码器。

Bioinformatics. 2020 Aug 15;36(16):4415-4422. doi: 10.1093/bioinformatics/btaa293.

Uncovering the key dimensions of high-throughput biomolecular data using deep learning.利用深度学习揭示高通量生物分子数据的关键维度。

Nucleic Acids Res. 2020 Jun 4;48(10):e56. doi: 10.1093/nar/gkaa191.

Interpretable factor models of single-cell RNA-seq via variational autoencoders.基于变分自动编码器的单细胞 RNA-seq 可解释因子模型。

Bioinformatics. 2020 Jun 1;36(11):3418-3421. doi: 10.1093/bioinformatics/btaa169.

Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks.使用生成对抗网络对单细胞 RNA-seq 数据进行真实的模拟生成和扩充。

Nat Commun. 2020 Jan 9;11(1):166. doi: 10.1038/s41467-019-14018-z.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于单细胞测序数据训练的可解释自动编码器可直接迁移到未见组织的数据。

Interpretable Autoencoders Trained on Single Cell Sequencing Data Can Transfer Directly to Data from Unseen Tissues.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献