基于深度学习的影响基因挖掘。

Mining influential genes based on deep learning.

机构信息

College of Agriculture, Nanjing Agricultural University, Jiangsu, 210095, Nanjing, China.

Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, 210095, China.

出版信息

BMC Bioinformatics. 2021 Jan 22;22(1):27. doi: 10.1186/s12859-021-03972-5.

DOI:10.1186/s12859-021-03972-5

PMID:33482718

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7821411/

Abstract

BACKGROUND

Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome.

RESULTS

Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics [mean absolute error (MAE) and Pearson correlation coefficient (PCC)]. This reveals that the landmark genes detected by our method contain more genomic information.

CONCLUSIONS

We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.

摘要

背景

目前，大规模基因表达谱分析已成功应用于发现疾病、遗传扰动和药物作用之间的功能联系。为了解决基因表达谱不断扩大的成本问题，提出了一种新的、低成本、高通量的简化表达谱分析方法 L1000，可以生成一百万种谱图。虽然已经确定了一组约 1000 个精心挑选的地标基因，这些基因可以捕获整个基因组约 80%的信息，用于 L1000，但使用这些地标基因推断靶基因的稳健性并不令人满意。因此，仍然需要更有效的计算方法来深入挖掘基因组中的有影响的基因。

结果

在这里，我们提出了一个基于深度学习的计算框架，用于挖掘可以覆盖更多基因组信息的基因子集。具体来说，首先构建了一个自动编码器框架来学习基因之间的非线性关系，然后应用 DeepLIFT 来计算基因重要性得分。通过这种数据驱动的方法，我们重新获得了一个地标基因集。结果表明，我们的地标基因可以比 L1000 更准确和稳健地预测靶基因，这两个指标分别是均方误差 (MAE) 和皮尔逊相关系数 (PCC)。这表明我们的方法检测到的地标基因包含更多的基因组信息。

结论

我们相信，我们提出的框架非常适合分析生物大数据，以揭示生命的奥秘。此外，从这项研究中推断出的地标基因可以用于基因表达谱的爆炸式扩增，以促进对功能联系的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9331/7821411/1c9724f82f4d/12859_2021_3972_Fig1_HTML.jpg

相似文献

Mining influential genes based on deep learning.

BMC Bioinformatics. 2021 Jan 22;22(1):27. doi: 10.1186/s12859-021-03972-5.

D-GPM: A Deep Learning Method for Gene Promoter Methylation Inference.

Genes (Basel). 2019 Oct 14;10(10):807. doi: 10.3390/genes10100807.

Transforming L1000 profiles to RNA-seq-like profiles with deep learning.

BMC Bioinformatics. 2022 Sep 13;23(1):374. doi: 10.1186/s12859-022-04895-5.

Gene expression inference with deep learning.

Bioinformatics. 2016 Jun 15;32(12):1832-9. doi: 10.1093/bioinformatics/btw074. Epub 2016 Feb 11.

Deep Large-Scale Multitask Learning Network for Gene Expression Inference.

J Comput Biol. 2021 May;28(5):485-500. doi: 10.1089/cmb.2020.0438.

Conditional generative adversarial network for gene expression inference.

Bioinformatics. 2018 Sep 1;34(17):i603-i611. doi: 10.1093/bioinformatics/bty563.

A genome-scale deep learning model to predict gene expression changes of genetic perturbations from multiplex biological networks.

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae433.

Identification of human circadian genes based on time course gene expression profiles by using a deep learning method.

Biochim Biophys Acta Mol Basis Dis. 2018 Jun;1864(6 Pt B):2274-2283. doi: 10.1016/j.bbadis.2017.12.004. Epub 2017 Dec 12.

Representing high throughput expression profiles via perturbation barcodes reveals compound targets.

PLoS Comput Biol. 2017 Feb 9;13(2):e1005335. doi: 10.1371/journal.pcbi.1005335. eCollection 2017 Feb.

SWnet: a deep learning model for drug response prediction from cancer genomic signatures and compound chemical structures.

BMC Bioinformatics. 2021 Sep 10;22(1):434. doi: 10.1186/s12859-021-04352-9.

引用本文的文献

DeepSplice: a deep learning approach for accurate prediction of alternative splicing events in the human genome.

Front Genet. 2024 Jun 21;15:1349546. doi: 10.3389/fgene.2024.1349546. eCollection 2024.

Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications.

Cancers (Basel). 2023 Mar 24;15(7):1958. doi: 10.3390/cancers15071958.

Interpretable meta-learning of multi-omics data for survival analysis and pathway enrichment.

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad113.

DRUG-seq Provides Unbiased Biological Activity Readouts for Neuroscience Drug Discovery.

ACS Chem Biol. 2022 Jun 17;17(6):1401-1414. doi: 10.1021/acschembio.1c00920. Epub 2022 May 4.

Interpretation of convolutional neural networks reveals crucial sequence features involving in transcription during fiber development.

BMC Bioinformatics. 2022 Mar 15;23(1):91. doi: 10.1186/s12859-022-04619-9.

本文引用的文献

MTTFsite: cross-cell type TF binding site prediction by using multi-task learning.

Bioinformatics. 2019 Dec 15;35(24):5067-5077. doi: 10.1093/bioinformatics/btz451.

i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome.

Bioinformatics. 2019 Aug 15;35(16):2796-2800. doi: 10.1093/bioinformatics/btz015.

Promoter analysis and prediction in the human genome using sequence-based deep learning models.

Bioinformatics. 2019 Aug 15;35(16):2730-2737. doi: 10.1093/bioinformatics/bty1068.

Conditional generative adversarial network for gene expression inference.

Bioinformatics. 2018 Sep 1;34(17):i603-i611. doi: 10.1093/bioinformatics/bty563.

DeepGSR: an optimized deep-learning structure for the recognition of genomic signals and regions.

Bioinformatics. 2019 Apr 1;35(7):1125-1132. doi: 10.1093/bioinformatics/bty752.

SpliceRover: interpretable convolutional neural networks for improved splice site prediction.

Bioinformatics. 2018 Dec 15;34(24):4180-4188. doi: 10.1093/bioinformatics/bty497.

deepNF: deep network fusion for protein function prediction.

Bioinformatics. 2018 Nov 15;34(22):3873-3881. doi: 10.1093/bioinformatics/bty440.

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.

Cell. 2017 Nov 30;171(6):1437-1452.e17. doi: 10.1016/j.cell.2017.10.049.

Deep Learning-Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer.

Clin Cancer Res. 2018 Mar 15;24(6):1248-1259. doi: 10.1158/1078-0432.CCR-17-0853. Epub 2017 Oct 5.

In Situ Transcription Profiling of Single Cells Reveals Spatial Organization of Cells in the Mouse Hippocampus.

Neuron. 2016 Oct 19;92(2):342-357. doi: 10.1016/j.neuron.2016.10.001.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于深度学习的影响基因挖掘。

Mining influential genes based on deep learning.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献