Suppr超能文献

从单细胞 mRNA 测序数据中反卷积自动编码器以学习生物调节模块。

Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data.

机构信息

Centre for Genomic Medicine Rigshospitalet, University of Copenhagen, Copenhagen, Denmark.

Section for Cognitive Systems Department of Applied Mathematics and Computer Science, Technical University of Denmark, Lyngby, Denmark.

出版信息

BMC Bioinformatics. 2019 Jul 8;20(1):379. doi: 10.1186/s12859-019-2952-9.

Abstract

BACKGROUND

Unsupervised machine learning methods (deep learning) have shown their usefulness with noisy single cell mRNA-sequencing data (scRNA-seq), where the models generalize well, despite the zero-inflation of the data. A class of neural networks, namely autoencoders, has been useful for denoising of single cell data, imputation of missing values and dimensionality reduction.

RESULTS

Here, we present a striking feature with the potential to greatly increase the usability of autoencoders: With specialized training, the autoencoder is not only able to generalize over the data, but also to tease apart biologically meaningful modules, which we found encoded in the representation layer of the network. Our model can, from scRNA-seq data, delineate biological meaningful modules that govern a dataset, as well as give information as to which modules are active in each single cell. Importantly, most of these modules can be explained by known biological functions, as provided by the Hallmark gene sets.

CONCLUSIONS

We discover that tailored training of an autoencoder makes it possible to deconvolute biological modules inherent in the data, without any assumptions. By comparisons with gene signatures of canonical pathways we see that the modules are directly interpretable. The scope of this discovery has important implications, as it makes it possible to outline the drivers behind a given effect of a cell. In comparison with other dimensionality reduction methods, or supervised models for classification, our approach has the benefit of both handling well the zero-inflated nature of scRNA-seq, and validating that the model captures relevant information, by establishing a link between input and decoded data. In perspective, our model in combination with clustering methods is able to provide information about which subtype a given single cell belongs to, as well as which biological functions determine that membership.

摘要

背景

无监督机器学习方法(深度学习)在嘈杂的单细胞 mRNA 测序数据(scRNA-seq)中显示出了它们的有用性,尽管数据存在零膨胀,但这些模型仍能很好地推广。一类神经网络,即自动编码器,已被证明可用于单细胞数据去噪、缺失值插补和降维。

结果

在这里,我们提出了一个引人注目的特征,它有可能极大地提高自动编码器的可用性:通过专门的训练,自动编码器不仅能够对数据进行概括,还能够分离出具有生物学意义的模块,我们发现这些模块被编码在网络的表示层中。我们的模型可以从 scRNA-seq 数据中勾勒出控制数据集的生物学意义模块,并提供有关每个单细胞中哪些模块处于活动状态的信息。重要的是,这些模块中的大多数都可以通过已知的生物学功能来解释,这些生物学功能是由 Hallmark 基因集提供的。

结论

我们发现,对自动编码器进行专门训练,可以在不做任何假设的情况下,对数据中固有的生物学模块进行去卷积。通过与经典途径的基因特征进行比较,我们发现这些模块是可以直接解释的。这一发现的意义重大,因为它使得有可能描绘出细胞特定效应背后的驱动因素。与其他降维方法或分类的监督模型相比,我们的方法具有处理 scRNA-seq 零膨胀性质的优势,并且通过建立输入和解码数据之间的联系,验证了模型捕获了相关信息。从长远来看,我们的模型与聚类方法相结合,能够提供有关给定单细胞所属的亚型以及哪些生物学功能决定该归属的信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4451/6615267/97d9df7fed44/12859_2019_2952_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验