Suppr超能文献

细胞身份代码:使用深度神经网络从基因表达谱中理解细胞身份。

Cell Identity Codes: Understanding Cell Identity from Gene Expression Profiles using Deep Neural Networks.

机构信息

Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.

Royan Institute for Stem Cell Biology and Technology, ACECR, Tehran, Iran.

出版信息

Sci Rep. 2019 Feb 20;9(1):2342. doi: 10.1038/s41598-019-38798-y.

Abstract

Understanding cell identity is an important task in many biomedical areas. Expression patterns of specific marker genes have been used to characterize some limited cell types, but exclusive markers are not available for many cell types. A second approach is to use machine learning to discriminate cell types based on the whole gene expression profiles (GEPs). The accuracies of simple classification algorithms such as linear discriminators or support vector machines are limited due to the complexity of biological systems. We used deep neural networks to analyze 1040 GEPs from 16 different human tissues and cell types. After comparing different architectures, we identified a specific structure of deep autoencoders that can encode a GEP into a vector of 30 numeric values, which we call the cell identity code (CIC). The original GEP can be reproduced from the CIC with an accuracy comparable to technical replicates of the same experiment. Although we use an unsupervised approach to train the autoencoder, we show different values of the CIC are connected to different biological aspects of the cell, such as different pathways or biological processes. This network can use CIC to reproduce the GEP of the cell types it has never seen during the training. It also can resist some noise in the measurement of the GEP. Furthermore, we introduce classifier autoencoder, an architecture that can accurately identify cell type based on the GEP or the CIC.

摘要

理解细胞身份是许多生物医学领域的重要任务。特定标记基因的表达模式已被用于描述一些有限的细胞类型,但许多细胞类型没有特异性标记物。另一种方法是使用机器学习根据整个基因表达谱(GEP)来区分细胞类型。由于生物系统的复杂性,简单的分类算法(如线性判别器或支持向量机)的准确性有限。我们使用深度神经网络分析了来自 16 种不同人类组织和细胞类型的 1040 个 GEP。在比较不同的架构之后,我们确定了深度自动编码器的特定结构,该结构可以将 GEP 编码为 30 个数字值的向量,我们称之为细胞身份码(CIC)。原始 GEP 可以从 CIC 以与同一实验的技术重复相当的精度进行重现。尽管我们使用无监督方法来训练自动编码器,但我们表明 CIC 的不同值与细胞的不同生物学方面相关,例如不同的途径或生物学过程。该网络可以使用 CIC 重现其在训练过程中从未见过的细胞类型的 GEP。它还可以抵抗 GEP 测量中的一些噪声。此外,我们引入了分类器自动编码器,该架构可以基于 GEP 或 CIC 准确地识别细胞类型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a7b/6382891/2ab338871174/41598_2019_38798_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验