深度神经网络预测全基因组转录组特征——超越黑盒。

Deep neural network prediction of genome-wide transcriptome signatures - beyond the Black-box.

机构信息

Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden.

School of Bioscience, Systems Biology Research Center, University of Skövde, Skövde, Sweden.

出版信息

NPJ Syst Biol Appl. 2022 Feb 23;8(1):9. doi: 10.1038/s41540-022-00218-9.

DOI:10.1038/s41540-022-00218-9

PMID:35197482

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8866467/

Abstract

Prediction algorithms for protein or gene structures, including transcription factor binding from sequence information, have been transformative in understanding gene regulation. Here we ask whether human transcriptomic profiles can be predicted solely from the expression of transcription factors (TFs). We find that the expression of 1600 TFs can explain >95% of the variance in 25,000 genes. Using the light-up technique to inspect the trained NN, we find an over-representation of known TF-gene regulations. Furthermore, the learned prediction network has a hierarchical organization. A smaller set of around 125 core TFs could explain close to 80% of the variance. Interestingly, reducing the number of TFs below 500 induces a rapid decline in prediction performance. Next, we evaluated the prediction model using transcriptional data from 22 human diseases. The TFs were sufficient to predict the dysregulation of the target genes (rho = 0.61, P < 10). By inspecting the model, key causative TFs could be extracted for subsequent validation using disease-associated genetic variants. We demonstrate a methodology for constructing an interpretable neural network predictor, where analyses of the predictors identified key TFs that were inducing transcriptional changes during disease.

摘要

预测蛋白质或基因结构的算法，包括从序列信息预测转录因子结合，在理解基因调控方面具有变革性。在这里，我们想知道人类转录组谱是否可以仅从转录因子（TFs）的表达中预测。我们发现，1600 个 TF 的表达可以解释 25000 个基因中 >95%的方差。使用点亮技术检查训练好的神经网络，我们发现已知 TF-基因调控的过度表现。此外，学习到的预测网络具有层次化的组织。大约 125 个核心 TF 可以解释接近 80%的方差。有趣的是，将 TF 的数量减少到 500 以下会导致预测性能的快速下降。接下来，我们使用 22 种人类疾病的转录组数据评估了预测模型。TFs 足以预测靶基因的失调（rho=0.61，P<10）。通过检查模型，可以提取关键的致病 TF，以便使用与疾病相关的遗传变异进行后续验证。我们展示了一种构建可解释神经网络预测器的方法，其中对预测器的分析确定了在疾病期间诱导转录变化的关键 TF。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8207/8866467/6ec02d026eb3/41540_2022_218_Fig1_HTML.jpg

相似文献

Deep neural network prediction of genome-wide transcriptome signatures - beyond the Black-box.深度神经网络预测全基因组转录组特征——超越黑盒。

NPJ Syst Biol Appl. 2022 Feb 23;8(1):9. doi: 10.1038/s41540-022-00218-9.

DeepTFactor: A deep learning-based tool for the prediction of transcription factors.DeepTFactor：一种基于深度学习的转录因子预测工具。

Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). doi: 10.1073/pnas.2021171118.

Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility.基于染色质可及性评估预测转录因子结合位点的模型可转移性。

BMC Bioinformatics. 2017 Jul 27;18(1):355. doi: 10.1186/s12859-017-1769-7.

Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data.通过整合多源生物数据基于网络基序识别转录因子-靶基因关系

BMC Bioinformatics. 2008 Apr 21;9:203. doi: 10.1186/1471-2105-9-203.

Trimming of mammalian transcriptional networks using network component analysis.使用网络组件分析修剪哺乳动物转录网络。

BMC Bioinformatics. 2010 Oct 13;11:511. doi: 10.1186/1471-2105-11-511.

Cross-Cell-Type Prediction of TF-Binding Site by Integrating Convolutional Neural Network and Adversarial Network.基于卷积神经网络和对抗网络的跨细胞类型预测 TF 结合位点

Int J Mol Sci. 2019 Jul 12;20(14):3425. doi: 10.3390/ijms20143425.

Enhancing the interpretability of transcription factor binding site prediction using attention mechanism.利用注意力机制提高转录因子结合位点预测的可解释性。

Sci Rep. 2020 Aug 7;10(1):13413. doi: 10.1038/s41598-020-70218-4.

Detection of transcription factors binding to methylated DNA by deep recurrent neural network.通过深度递归神经网络检测与甲基化 DNA 结合的转录因子。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab533.

hTFtarget: A Comprehensive Database for Regulations of Human Transcription Factors and Their Targets.hTFtarget：人类转录因子及其靶标调控的综合数据库。

Genomics Proteomics Bioinformatics. 2020 Apr;18(2):120-128. doi: 10.1016/j.gpb.2019.09.006. Epub 2020 Aug 26.

Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data.调控网络预测：从基因表达数据中全基因组鉴定转录因子靶点

Bioinformatics. 2003 Oct 12;19(15):1917-26. doi: 10.1093/bioinformatics/btg347.

引用本文的文献

Effective Pruning for Top-k Feature Search on the Basis of SHAP Values.基于SHAP值的Top-k特征搜索的有效剪枝

IEEE Access. 2024;12:163079-163092. doi: 10.1109/access.2024.3489958. Epub 2024 Nov 1.

Gene expression inference based on graph neural networks using L1000 data.基于使用L1000数据的图神经网络的基因表达推断

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf273.

Latent space arithmetic on data embeddings from healthy multi-tissue human RNA-seq decodes disease modules.来自健康多组织人类RNA测序数据嵌入的潜在空间算法可解码疾病模块。

Patterns (N Y). 2024 Oct 31;5(11):101093. doi: 10.1016/j.patter.2024.101093. eCollection 2024 Nov 8.

CRISPR-GEM: A Novel Machine Learning Model for CRISPR Genetic Target Discovery and Evaluation.CRISPR-GEM：一种用于 CRISPR 遗传靶标发现和评估的新型机器学习模型。

ACS Synth Biol. 2024 Oct 18;13(10):3413-3429. doi: 10.1021/acssynbio.4c00473. Epub 2024 Oct 7.

Designing interpretable deep learning applications for functional genomics: a quantitative analysis.设计可解释的深度学习应用于功能基因组学：一项定量分析。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae449.

Biological sex affects functional variation across the human genome.生物性别影响整个人类基因组的功能变异。

medRxiv. 2024 Sep 5:2024.09.03.24313025. doi: 10.1101/2024.09.03.24313025.

CRISPR-GEM: A Novel Machine Learning Model for CRISPR Genetic Target Discovery and Evaluation.CRISPR-GEM：一种用于CRISPR基因靶点发现与评估的新型机器学习模型。

bioRxiv. 2024 Jul 3:2024.07.01.601587. doi: 10.1101/2024.07.01.601587.

ADVANCING THE UNDERSTANDING OF CLINICAL SEPSIS USING GENE EXPRESSION-DRIVEN MACHINE LEARNING TO IMPROVE PATIENT OUTCOMES.利用基于基因表达的机器学习来改善患者预后，从而深入了解临床败血症。

Shock. 2024 Jan 1;61(1):4-18. doi: 10.1097/SHK.0000000000002227. Epub 2023 Sep 22.

Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations.生物学启发的变分自动编码器允许对遗传和药物诱导的扰动进行预测建模。

Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad387.

An interpretable model for stock price movement prediction based on the hierarchical belief rule base.一种基于分层置信规则库的股票价格走势预测可解释模型。

Heliyon. 2023 May 26;9(6):e16589. doi: 10.1016/j.heliyon.2023.e16589. eCollection 2023 Jun.

本文引用的文献

A deep learning method for HLA imputation and trans-ethnic MHC fine-mapping of type 1 diabetes.一种用于 1 型糖尿病 HLA 推断和跨种族 MHC 精细定位的深度学习方法。

Nat Commun. 2021 Mar 12;12(1):1639. doi: 10.1038/s41467-021-21975-x.

Cellular reprogramming: Mathematics meets medicine.细胞重编程：数学与医学的交汇

Wiley Interdiscip Rev Syst Biol Med. 2020 Dec 2:e1515. doi: 10.1002/wsbm.1515.

Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure.深度学习表明，基因表达是由共同进化的相互作用基因调控结构的所有部分编码的。

Nat Commun. 2020 Dec 1;11(1):6141. doi: 10.1038/s41467-020-19921-4.

The transcriptomic response of cells to a drug combination is more than the sum of the responses to the monotherapies.细胞对药物组合的转录组反应大于单药治疗的反应总和。

Elife. 2020 Sep 18;9:e52707. doi: 10.7554/eLife.52707.

Metabolic rewiring in the promotion of cancer metastasis: mechanisms and therapeutic implications.代谢重编程在促进癌症转移中的作用：机制与治疗意义。

Oncogene. 2020 Sep;39(39):6139-6156. doi: 10.1038/s41388-020-01432-7. Epub 2020 Aug 24.

Deep learning decodes the principles of differential gene expression.深度学习解码差异基因表达的原理。

Nat Mach Intell. 2020 Jul;2(7):376-386. doi: 10.1038/s42256-020-0201-6. Epub 2020 Jul 6.

Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks.利用深度卷积神经网络直接从基因组序列预测 mRNA 丰度。

Cell Rep. 2020 May 19;31(7):107663. doi: 10.1016/j.celrep.2020.107663.

scVAE: variational auto-encoders for single-cell gene expression data.scVAE：用于单细胞基因表达数据的变分自动编码器。

Bioinformatics. 2020 Aug 15;36(16):4415-4422. doi: 10.1093/bioinformatics/btaa293.

Application of deep learning methods in biological networks.深度学习方法在生物网络中的应用。

Brief Bioinform. 2021 Mar 22;22(2):1902-1917. doi: 10.1093/bib/bbaa043.

Machine intelligence in healthcare-perspectives on trustworthiness, explainability, usability, and transparency.医疗保健中的机器智能——关于可信度、可解释性、可用性和透明度的观点

NPJ Digit Med. 2020 Mar 26;3:47. doi: 10.1038/s41746-020-0254-2. eCollection 2020.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

深度神经网络预测全基因组转录组特征——超越黑盒。

Deep neural network prediction of genome-wide transcriptome signatures - beyond the Black-box.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献