细胞身份代码：使用深度神经网络从基因表达谱中理解细胞身份。

Cell Identity Codes: Understanding Cell Identity from Gene Expression Profiles using Deep Neural Networks.

机构信息

Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.

Royan Institute for Stem Cell Biology and Technology, ACECR, Tehran, Iran.

出版信息

Sci Rep. 2019 Feb 20;9(1):2342. doi: 10.1038/s41598-019-38798-y.

DOI:10.1038/s41598-019-38798-y

PMID:30787315

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6382891/

Abstract

Understanding cell identity is an important task in many biomedical areas. Expression patterns of specific marker genes have been used to characterize some limited cell types, but exclusive markers are not available for many cell types. A second approach is to use machine learning to discriminate cell types based on the whole gene expression profiles (GEPs). The accuracies of simple classification algorithms such as linear discriminators or support vector machines are limited due to the complexity of biological systems. We used deep neural networks to analyze 1040 GEPs from 16 different human tissues and cell types. After comparing different architectures, we identified a specific structure of deep autoencoders that can encode a GEP into a vector of 30 numeric values, which we call the cell identity code (CIC). The original GEP can be reproduced from the CIC with an accuracy comparable to technical replicates of the same experiment. Although we use an unsupervised approach to train the autoencoder, we show different values of the CIC are connected to different biological aspects of the cell, such as different pathways or biological processes. This network can use CIC to reproduce the GEP of the cell types it has never seen during the training. It also can resist some noise in the measurement of the GEP. Furthermore, we introduce classifier autoencoder, an architecture that can accurately identify cell type based on the GEP or the CIC.

摘要

理解细胞身份是许多生物医学领域的重要任务。特定标记基因的表达模式已被用于描述一些有限的细胞类型，但许多细胞类型没有特异性标记物。另一种方法是使用机器学习根据整个基因表达谱（GEP）来区分细胞类型。由于生物系统的复杂性，简单的分类算法（如线性判别器或支持向量机）的准确性有限。我们使用深度神经网络分析了来自 16 种不同人类组织和细胞类型的 1040 个 GEP。在比较不同的架构之后，我们确定了深度自动编码器的特定结构，该结构可以将 GEP 编码为 30 个数字值的向量，我们称之为细胞身份码（CIC）。原始 GEP 可以从 CIC 以与同一实验的技术重复相当的精度进行重现。尽管我们使用无监督方法来训练自动编码器，但我们表明 CIC 的不同值与细胞的不同生物学方面相关，例如不同的途径或生物学过程。该网络可以使用 CIC 重现其在训练过程中从未见过的细胞类型的 GEP。它还可以抵抗 GEP 测量中的一些噪声。此外，我们引入了分类器自动编码器，该架构可以基于 GEP 或 CIC 准确地识别细胞类型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a7b/6382891/2ab338871174/41598_2019_38798_Fig1_HTML.jpg

相似文献

Cell Identity Codes: Understanding Cell Identity from Gene Expression Profiles using Deep Neural Networks.细胞身份代码：使用深度神经网络从基因表达谱中理解细胞身份。

Sci Rep. 2019 Feb 20;9(1):2342. doi: 10.1038/s41598-019-38798-y.

DeePathology: Deep Multi-Task Learning for Inferring Molecular Pathology from Cancer Transcriptome.DeePathology：从癌症转录组推断分子病理学的深度多任务学习。

Sci Rep. 2019 Nov 11;9(1):16526. doi: 10.1038/s41598-019-52937-5.

Recognition of peripheral blood cell images using convolutional neural networks.使用卷积神经网络识别外周血细胞图像。

Comput Methods Programs Biomed. 2019 Oct;180:105020. doi: 10.1016/j.cmpb.2019.105020. Epub 2019 Aug 9.

Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用：以新生儿呼吸暂停预测为例的研究

Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.

Deep learning for electroencephalogram (EEG) classification tasks: a review.深度学习在脑电图（EEG）分类任务中的应用：综述。

J Neural Eng. 2019 Jun;16(3):031001. doi: 10.1088/1741-2552/ab0ab5. Epub 2019 Feb 26.

Comparing deep belief networks with support vector machines for classifying gene expression data from complex disorders.比较深度置信网络与支持向量机在分类复杂疾病基因表达数据中的应用。

FEBS Open Bio. 2019 Jul;9(7):1232-1248. doi: 10.1002/2211-5463.12652. Epub 2019 Jun 7.

Exploring microRNA Regulation of Cancer with Context-Aware Deep Cancer Classifier.利用上下文感知深度癌症分类器探索微小RNA对癌症的调控

Pac Symp Biocomput. 2019;24:160-171.

Verifying explainability of a deep learning tissue classifier trained on RNA-seq data.验证基于 RNA-seq 数据训练的深度学习组织分类器的可解释性。

Sci Rep. 2021 Jan 29;11(1):2641. doi: 10.1038/s41598-021-81773-9.

SDARE: A stacked denoising autoencoder method for game dynamics network structure reconstruction.SDARE：一种用于游戏动态网络结构重建的堆叠去噪自动编码器方法。

Neural Netw. 2020 Jun;126:143-152. doi: 10.1016/j.neunet.2020.03.008. Epub 2020 Mar 14.

Unsupervised and supervised learning with neural network for human transcriptome analysis and cancer diagnosis.基于神经网络的无监督和监督学习在人类转录组分析和癌症诊断中的应用。

Sci Rep. 2020 Nov 5;10(1):19106. doi: 10.1038/s41598-020-75715-0.

引用本文的文献

Specific modulation of CRISPR transcriptional activators through RNA-sensing guide RNAs in mammalian cells and zebrafish embryos.通过在哺乳动物细胞和斑马鱼胚胎中进行RNA感应引导RNA对CRISPR转录激活因子进行特异性调控。

Elife. 2025 Jul 29;12:RP87722. doi: 10.7554/eLife.87722.

An autoencoder learning method for predicting breast cancer subtypes.一种用于预测乳腺癌亚型的自动编码器学习方法。

PLoS One. 2025 Jul 23;20(7):e0327773. doi: 10.1371/journal.pone.0327773. eCollection 2025.

ST-GEARS: Advancing 3D downstream research through accurate spatial information recovery.ST-GEARS：通过准确的空间信息恢复推进 3D 下游研究。

Nat Commun. 2024 Sep 6;15(1):7806. doi: 10.1038/s41467-024-51935-0.

Fatecode enables cell fate regulator prediction using classification-supervised autoencoder perturbation.命运编码通过分类监督自编码器扰动实现细胞命运调控因子预测。

Cell Rep Methods. 2024 Jul 15;4(7):100819. doi: 10.1016/j.crmeth.2024.100819. Epub 2024 Jul 9.

Tissue-Predisposition to Cancer Driver Mutations.组织对癌症驱动突变的易感性。

Cells. 2024 Jan 5;13(2):106. doi: 10.3390/cells13020106.

RNA-Responsive gRNAs for Controlling CRISPR Activity: Current Advances, Future Directions, and Potential Applications.RNA 响应性 gRNA 用于控制 CRISPR 活性：当前进展、未来方向和潜在应用。

CRISPR J. 2022 Oct;5(5):642-659. doi: 10.1089/crispr.2022.0052. Epub 2022 Oct 7.

Application of Machine-Learning Methods to Recognize mitoBK Channels from Different Cell Types Based on the Experimental Patch-Clamp Results.基于实验膜片钳结果的机器学习方法在识别不同细胞类型中的 mitoBK 通道中的应用。

Int J Mol Sci. 2021 Jan 15;22(2):840. doi: 10.3390/ijms22020840.

Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches.药物研发中的人工智能：数据驱动与机器学习方法的全面综述

Biotechnol Bioprocess Eng. 2020;25(6):895-930. doi: 10.1007/s12257-020-0049-y. Epub 2021 Jan 7.

Topological network measures for drug repositioning.拓扑网络度量在药物重定位中的应用。

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa357.

Comparison of similar cells: Mesenchymal stromal cells and fibroblasts.比较相似细胞：间充质基质细胞和纤维母细胞。

Acta Histochem. 2020 Dec;122(8):151634. doi: 10.1016/j.acthis.2020.151634. Epub 2020 Oct 12.

本文引用的文献

ADAGE-Based Integration of Publicly Available Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions.基于ADAGE的公开可用基因表达数据与去噪自动编码器的整合揭示了微生物与宿主的相互作用。

mSystems. 2016 Jan 19;1(1). doi: 10.1128/mSystems.00025-15. eCollection 2016 Jan-Feb.

Prediction of residue-residue contact matrix for protein-protein interaction with Fisher score features and deep learning.利用Fisher分数特征和深度学习预测蛋白质-蛋白质相互作用的残基-残基接触矩阵

Methods. 2016 Nov 1;110:97-105. doi: 10.1016/j.ymeth.2016.06.001. Epub 2016 Jun 6.

DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.DanQ：一种用于量化DNA序列功能的卷积与循环相结合的深度神经网络。

Nucleic Acids Res. 2016 Jun 20;44(11):e107. doi: 10.1093/nar/gkw226. Epub 2016 Apr 15.

Gene expression inference with deep learning.基于深度学习的基因表达推断

Bioinformatics. 2016 Jun 15;32(12):1832-9. doi: 10.1093/bioinformatics/btw074. Epub 2016 Feb 11.

Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。

Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.

Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model.使用自动编码器模型学习酵母转录组机制的层次表示。

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):9. doi: 10.1186/s12859-015-0852-1.

Improving Protein Fold Recognition by Deep Learning Networks.通过深度学习网络改进蛋白质折叠识别

Sci Rep. 2015 Dec 4;5:17573. doi: 10.1038/srep17573.

De novo identification of replication-timing domains in the human genome by deep learning.通过深度学习对人类基因组中复制时间结构域进行从头识别。

Bioinformatics. 2016 Mar 1;32(5):641-9. doi: 10.1093/bioinformatics/btv643. Epub 2015 Nov 5.

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.通过深度学习预测 DNA 和 RNA 结合蛋白的序列特异性。

Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.

Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning.通过迭代深度学习改进蛋白质二级结构、局部主链角度和溶剂可及表面积的预测。

Sci Rep. 2015 Jun 22;5:11476. doi: 10.1038/srep11476.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

细胞身份代码：使用深度神经网络从基因表达谱中理解细胞身份。

Cell Identity Codes: Understanding Cell Identity from Gene Expression Profiles using Deep Neural Networks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献