Suppr超能文献

基于深度学习的基因表达推断

Gene expression inference with deep learning.

作者信息

Chen Yifei, Li Yi, Narayan Rajiv, Subramanian Aravind, Xie Xiaohui

机构信息

Department of Computer Science, University of California, Irvine, CA 92697, USA Baidu Research-Big Data Lab, Beijing, 100085, China.

Department of Computer Science, University of California, Irvine, CA 92697, USA.

出版信息

Bioinformatics. 2016 Jun 15;32(12):1832-9. doi: 10.1093/bioinformatics/btw074. Epub 2016 Feb 11.

Abstract

MOTIVATION

Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes.

RESULTS

We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes.

AVAILABILITY AND IMPLEMENTATION

D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大规模基因表达谱分析已被广泛用于表征细胞在各种疾病状态、基因扰动等情况下的状态。尽管全基因组表达谱的成本一直在稳步下降,但生成数千个样本的表达谱汇编仍然非常昂贵。认识到基因表达通常高度相关,美国国立卫生研究院(NIH)LINCS项目的研究人员开发了一种经济高效的策略,仅对约1000个精心挑选的标志性基因进行谱分析,并依靠计算方法来推断其余目标基因的表达。然而,LINCS项目采用的计算方法目前基于线性回归(LR),由于它没有捕捉到基因表达之间复杂的非线性关系,限制了其准确性。

结果

我们提出了一种深度学习方法(简称为D-GEX),用于从标志性基因的表达推断目标基因的表达。我们使用基于微阵列的基因表达综合数据集(由111K个表达谱组成)来训练我们的模型,并将其性能与其他方法进行比较。就所有基因的平均绝对误差而言,深度学习显著优于LR,相对提高了15.33%。基因层面的比较分析表明,深度学习在99.97%的目标基因中实现了比LR更低的误差。我们还在一个基于RNA-Seq的独立GTEx数据集(由2921个表达谱组成)上测试了我们学习到的模型的性能。深度学习仍然优于LR,相对提高了6.57%,并在81.31%的目标基因中实现了更低的误差。

可用性和实现

D-GEX可在https://github.com/uci-cbcl/D-GEX获取 联系方式:xhx@ics.uci.edu

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
Gene expression inference with deep learning.
Bioinformatics. 2016 Jun 15;32(12):1832-9. doi: 10.1093/bioinformatics/btw074. Epub 2016 Feb 11.
2
Blood-based multi-tissue gene expression inference with Bayesian ridge regression.
Bioinformatics. 2020 Jun 1;36(12):3788-3794. doi: 10.1093/bioinformatics/btaa239.
3
D-GPM: A Deep Learning Method for Gene Promoter Methylation Inference.
Genes (Basel). 2019 Oct 14;10(10):807. doi: 10.3390/genes10100807.
4
Deep Large-Scale Multitask Learning Network for Gene Expression Inference.
J Comput Biol. 2021 May;28(5):485-500. doi: 10.1089/cmb.2020.0438.
5
On transformative adaptive activation functions in neural networks for gene expression inference.
PLoS One. 2021 Jan 14;16(1):e0243915. doi: 10.1371/journal.pone.0243915. eCollection 2021.
6
Gene Expression Value Prediction Based on XGBoost Algorithm.
Front Genet. 2019 Nov 12;10:1077. doi: 10.3389/fgene.2019.01077. eCollection 2019.
7
A new LSTM-based gene expression prediction model: L-GEPM.
J Bioinform Comput Biol. 2019 Aug;17(4):1950022. doi: 10.1142/S0219720019500227.
8
Transforming L1000 profiles to RNA-seq-like profiles with deep learning.
BMC Bioinformatics. 2022 Sep 13;23(1):374. doi: 10.1186/s12859-022-04895-5.
9
Conditional generative adversarial network for gene expression inference.
Bioinformatics. 2018 Sep 1;34(17):i603-i611. doi: 10.1093/bioinformatics/bty563.
10
Mining influential genes based on deep learning.
BMC Bioinformatics. 2021 Jan 22;22(1):27. doi: 10.1186/s12859-021-03972-5.

引用本文的文献

2
Gene expression inference based on graph neural networks using L1000 data.
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf273.
4
A KAN-based hybrid deep neural networks for accurate identification of transcription factor binding sites.
PLoS One. 2025 May 7;20(5):e0322978. doi: 10.1371/journal.pone.0322978. eCollection 2025.
5
The Strategy and Application of Gene Attenuation in Metabolic Engineering.
Microorganisms. 2025 Apr 17;13(4):927. doi: 10.3390/microorganisms13040927.
6
Multiomics Research: Principles and Challenges in Integrated Analysis.
Biodes Res. 2024 Dec 5;6:0059. doi: 10.34133/bdr.0059. eCollection 2024.
7
Genome-wide association study on color-image-based convolutional neural networks.
PeerJ. 2025 Jan 13;13:e18822. doi: 10.7717/peerj.18822. eCollection 2025.
9
mRNA vaccine sequence and structure design and optimization: Advances and challenges.
J Biol Chem. 2025 Jan;301(1):108015. doi: 10.1016/j.jbc.2024.108015. Epub 2024 Nov 26.

本文引用的文献

1
Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans.
Science. 2015 May 8;348(6235):648-60. doi: 10.1126/science.1262110. Epub 2015 May 7.
2
DANN: a deep learning approach for annotating the pathogenicity of genetic variants.
Bioinformatics. 2015 Mar 1;31(5):761-3. doi: 10.1093/bioinformatics/btu703. Epub 2014 Oct 22.
3
The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance.
Nat Biotechnol. 2014 Sep;32(9):926-32. doi: 10.1038/nbt.3001. Epub 2014 Aug 24.
4
Searching for exotic particles in high-energy physics with deep learning.
Nat Commun. 2014 Jul 2;5:4308. doi: 10.1038/ncomms5308.
5
Deep learning of the tissue-regulated splicing code.
Bioinformatics. 2014 Jun 15;30(12):i121-9. doi: 10.1093/bioinformatics/btu277.
6
Low-rank regularization for learning gene expression programs.
PLoS One. 2013 Dec 17;8(12):e82146. doi: 10.1371/journal.pone.0082146. eCollection 2013.
7
Transcriptome and genome sequencing uncovers functional variation in humans.
Nature. 2013 Sep 26;501(7468):506-11. doi: 10.1038/nature12531. Epub 2013 Sep 15.
8
Representation learning: a review and new perspectives.
IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828. doi: 10.1109/TPAMI.2013.50.
9
The Genotype-Tissue Expression (GTEx) project.
Nat Genet. 2013 Jun;45(6):580-5. doi: 10.1038/ng.2653.
10
Deep architectures for protein contact map prediction.
Bioinformatics. 2012 Oct 1;28(19):2449-57. doi: 10.1093/bioinformatics/bts475. Epub 2012 Jul 30.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验