• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从组学数据中进行深度挖掘。

Deep Mining from Omics Data.

机构信息

School of Science and Technology, Department of Computer Science, Nottingham Trent University, Nottingham, UK.

Perceptronix Ltd, Hilton, Derbyshire, UK.

出版信息

Methods Mol Biol. 2022;2449:349-386. doi: 10.1007/978-1-0716-2095-3_15.

DOI:10.1007/978-1-0716-2095-3_15
PMID:35507271
Abstract

Since the advent of high-throughput omics technologies, various molecular data such as genes, transcripts, proteins, and metabolites have been made widely available to researchers. This has afforded clinicians, bioinformaticians, statisticians, and data scientists the opportunity to apply their innovations in feature mining and predictive modeling to a rich data resource to develop a wide range of generalizable prediction models. What has become apparent over the last 10 years is that researchers have adopted deep neural networks (or "deep nets") as their preferred paradigm of choice for complex data modeling due to the superiority of performance over more traditional statistical machine learning approaches, such as support vector machines. A key stumbling block, however, is that deep nets inherently lack transparency and are considered to be a "black box" approach. This naturally makes it very difficult for clinicians and other stakeholders to trust their deep learning models even though the model predictions appear to be highly accurate. In this chapter, we therefore provide a detailed summary of the deep net architectures typically used in omics research, together with a comprehensive summary of the notable "deep feature mining" techniques researchers have applied to open up this black box and provide some insights into the salient input features and why these models behave as they do. We group these techniques into the following three categories: (a) hidden layer visualization and interpretation; (b) input feature importance and impact evaluation; and (c) output layer gradient analysis. While we find that omics researchers have made some considerable gains in opening up the black box through interpretation of the hidden layer weights and node activations to identify salient input features, we highlight other approaches for omics researchers, such as employing deconvolutional network-based approaches and development of bespoke attribute impact measures to enable researchers to better understand the relationships between the input data and hidden layer representations formed and thus the output behavior of their deep nets.

摘要

自高通量组学技术问世以来,各种分子数据(如基因、转录本、蛋白质和代谢物)已经广泛提供给研究人员。这使得临床医生、生物信息学家、统计学家和数据科学家有机会将他们的创新应用于特征挖掘和预测建模中,以丰富的数据资源开发广泛适用的预测模型。过去 10 年来,一个明显的趋势是,由于性能优于支持向量机等更传统的统计机器学习方法,研究人员已经将深度学习网络(或“深度网络”)作为他们首选的复杂数据建模范例。然而,一个关键的障碍是,深度网络本质上缺乏透明度,被认为是一种“黑箱”方法。这使得临床医生和其他利益相关者即使模型预测似乎非常准确,也很难信任他们的深度学习模型。因此,在本章中,我们详细总结了通常在组学研究中使用的深度网络架构,并全面总结了研究人员应用于打开黑箱的显著“深度特征挖掘”技术,以深入了解显著的输入特征以及这些模型为什么会表现出这样的行为。我们将这些技术分为以下三类:(a)隐藏层可视化和解释;(b)输入特征重要性和影响评估;(c)输出层梯度分析。虽然我们发现组学研究人员通过解释隐藏层权重和节点激活来识别显著输入特征,从而在打开黑箱方面取得了一些重大进展,但我们还强调了其他适用于组学研究人员的方法,例如采用去卷积网络方法和开发定制属性影响度量,以使研究人员能够更好地理解输入数据与隐藏层表示之间的关系,从而了解其深度网络的输出行为。

相似文献

1
Deep Mining from Omics Data.从组学数据中进行深度挖掘。
Methods Mol Biol. 2022;2449:349-386. doi: 10.1007/978-1-0716-2095-3_15.
2
Architectures and accuracy of artificial neural network for disease classification from omics data.基于组学数据的疾病分类的人工神经网络结构和准确性。
BMC Genomics. 2019 Mar 4;20(1):167. doi: 10.1186/s12864-019-5546-z.
3
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.基于数据驱动的血糖动力学建模与预测:机器学习在 1 型糖尿病中的应用。
Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.
4
A novel deep mining model for effective knowledge discovery from omics data.一种用于从组学数据中进行有效知识发现的新型深度挖掘模型。
Artif Intell Med. 2020 Apr;104:101821. doi: 10.1016/j.artmed.2020.101821. Epub 2020 Feb 24.
5
Data Integration Using Advances in Machine Learning in Drug Discovery and Molecular Biology.利用机器学习进展进行药物发现和分子生物学中的数据整合
Methods Mol Biol. 2021;2190:167-184. doi: 10.1007/978-1-0716-0826-5_7.
6
Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用:以新生儿呼吸暂停预测为例的研究
Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.
7
Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE).使用多视图因子分解自动编码器(MAE)将多组学数据与生物相互作用网络集成。
BMC Genomics. 2019 Dec 20;20(Suppl 11):944. doi: 10.1186/s12864-019-6285-x.
8
What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics.什么造就了良好的预测?特征重要性以及开启遗传学中机器学习的黑箱。
Hum Genet. 2022 Sep;141(9):1515-1528. doi: 10.1007/s00439-021-02402-z. Epub 2021 Dec 4.
9
Advances in AI and machine learning for predictive medicine.人工智能和机器学习在预测医学中的进展。
J Hum Genet. 2024 Oct;69(10):487-497. doi: 10.1038/s10038-024-01231-y. Epub 2024 Feb 29.
10
Explaining decisions of graph convolutional neural networks: patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer.解释图卷积神经网络决策:乳腺癌转移预测中与患者特异性相关的分子子网络。
Genome Med. 2021 Mar 11;13(1):42. doi: 10.1186/s13073-021-00845-7.

引用本文的文献

1
Multi-omics: a bridge connecting genotype and phenotype for epilepsy?多组学:连接癫痫的基因型和表型的桥梁?
Biomark Res. 2025 Jun 18;13(1):86. doi: 10.1186/s40364-025-00798-8.
2
Artificial Intelligence and Heart-Brain Connections: A Narrative Review on Algorithms Utilization in Clinical Practice.人工智能与心脑连接:关于算法在临床实践中应用的叙述性综述
Healthcare (Basel). 2024 Jul 10;12(14):1380. doi: 10.3390/healthcare12141380.

本文引用的文献

1
Quantification of Differential Transcription Factor Activity and Multiomics-Based Classification into Activators and Repressors: diffTF.差异转录因子活性的定量分析及基于多组学的激活子和抑制剂分类:diffTF。
Cell Rep. 2019 Dec 3;29(10):3147-3159.e12. doi: 10.1016/j.celrep.2019.10.106.
2
AlphaFold at CASP13.AlphaFold 在 CASP13 中的应用。
Bioinformatics. 2019 Nov 1;35(22):4862-4865. doi: 10.1093/bioinformatics/btz422.
3
Sign-Consistency Based Variable Importance for Machine Learning in Brain Imaging.基于符号一致性的脑影像机器学习中的变量重要性。
Neuroinformatics. 2019 Oct;17(4):593-609. doi: 10.1007/s12021-019-9415-3.
4
Unsupervised classification of multi-omics data during cardiac remodeling using deep learning.使用深度学习对心脏重构过程中的多组学数据进行无监督分类。
Methods. 2019 Aug 15;166:66-73. doi: 10.1016/j.ymeth.2019.03.004. Epub 2019 Mar 7.
5
DeepAffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks.DeepAffinity:通过统一的递归和卷积神经网络实现化合物-蛋白质亲和力的可解释深度学习。
Bioinformatics. 2019 Sep 15;35(18):3329-3338. doi: 10.1093/bioinformatics/btz111.
6
DeepDiff: DEEP-learning for predicting DIFFerential gene expression from histone modifications.DeepDiff:基于深度学习的组蛋白修饰差异基因表达预测方法。
Bioinformatics. 2018 Sep 1;34(17):i891-i900. doi: 10.1093/bioinformatics/bty612.
7
Deep learning in omics: a survey and guideline.组学中的深度学习:综述与指南。
Brief Funct Genomics. 2019 Feb 14;18(1):41-57. doi: 10.1093/bfgp/ely030.
8
A universal SNP and small-indel variant caller using deep neural networks.使用深度神经网络的通用 SNP 和小插入缺失变体调用器。
Nat Biotechnol. 2018 Nov;36(10):983-987. doi: 10.1038/nbt.4235. Epub 2018 Sep 24.
9
Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.使用卷积深度学习神经网络识别原核生物和真核生物启动子。
PLoS One. 2017 Feb 3;12(2):e0171410. doi: 10.1371/journal.pone.0171410. eCollection 2017.
10
ADAGE-Based Integration of Publicly Available Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions.基于ADAGE的公开可用基因表达数据与去噪自动编码器的整合揭示了微生物与宿主的相互作用。
mSystems. 2016 Jan 19;1(1). doi: 10.1128/mSystems.00025-15. eCollection 2016 Jan-Feb.