Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Blegdamsvej 9, 2100 Copenhagen, Denmark.
Center for Surgical Science, Zealand University Hospital, Lykkebækvej 1, 4600 Koege, Denmark.
Cells. 2021 Dec 28;11(1):85. doi: 10.3390/cells11010085.
Autoencoders have been used to model single-cell mRNA-sequencing data with the purpose of denoising, visualization, data simulation, and dimensionality reduction. We, and others, have shown that autoencoders can be explainable models and interpreted in terms of biology. Here, we show that such autoencoders can generalize to the extent that they can transfer directly without additional training. In practice, we can extract biological modules, denoise, and classify data correctly from an autoencoder that was trained on a different dataset and with different cells (a foreign model). We deconvoluted the biological signal encoded in the bottleneck layer of scRNA-models using saliency maps and mapped salient features to biological pathways. Biological concepts could be associated with specific nodes and interpreted in relation to biological pathways. Even in this unsupervised framework, with no prior information about cell types or labels, the specific biological pathways deduced from the model were in line with findings in previous research. It was hypothesized that autoencoders could learn and represent meaningful biology; here, we show with a systematic experiment that this is true and even transcends the training data. This means that carefully trained autoencoders can be used to assist the interpretation of new unseen data.
自编码器已被用于对单细胞 mRNA 测序数据进行建模,目的是降噪、可视化、数据模拟和降维。我们和其他人已经表明,自编码器可以是可解释的模型,并可以从生物学的角度进行解释。在这里,我们表明,这种自编码器可以进行泛化,以至于它们可以直接转移而无需额外的训练。在实践中,我们可以从一个在不同数据集和不同细胞(外国模型)上训练的自编码器中提取生物学模块、降噪并正确分类数据。我们使用显着性图对 scRNA 模型的瓶颈层中编码的生物学信号进行去卷积,并将显着特征映射到生物学途径。可以将生物学概念与特定节点相关联,并根据生物学途径进行解释。即使在这个没有关于细胞类型或标签的先验信息的无监督框架中,从模型推断出的特定生物学途径也与之前的研究结果一致。假设自编码器可以学习和表示有意义的生物学;在这里,我们通过系统的实验表明这是正确的,甚至超越了训练数据。这意味着经过精心训练的自编码器可以用于协助解释新的未见数据。