• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ECMarker:可解释的机器学习模型,用于识别预测临床结果的基因表达生物标志物,并揭示人类疾病早期的分子机制。

ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages.

机构信息

Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53706, USA.

Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA.

出版信息

Bioinformatics. 2021 May 23;37(8):1115-1124. doi: 10.1093/bioinformatics/btaa935.

DOI:10.1093/bioinformatics/btaa935
PMID:33305308
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8150141/
Abstract

MOTIVATION

Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a 'black box', barely providing biological and clinical interpretability from the box.

RESULTS

To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value < 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine.

AVAILABILITYAND IMPLEMENTATION

ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基因表达和调控是驱动人类疾病发展的关键分子机制,但仍难以捉摸,尤其是在早期阶段。整合越来越多的人群基因组数据并理解疾病发展中的基因调控机制仍然具有挑战性。机器学习已被用于解决这一问题,但许多机器学习方法通常仅限于构建一个准确的预测模型作为“黑盒”,几乎无法从盒中提供生物学和临床可解释性。

结果

为了解决这些挑战,我们开发了一种可解释且可扩展的机器学习模型 ECMarker,用于预测疾病表型的基因表达生物标志物,并同时揭示潜在的调控机制。特别是,ECMarker 建立在半判别受限玻尔兹曼机的整合之上,这是一种用于分类的神经网络模型,允许在输入基因层进行横向连接。这个可解释的模型是可扩展的,不需要任何预先的特征选择,并且能够直接对基因进行建模和优先级排序,并揭示潜在的基因网络(来自横向连接)用于表型。在非小细胞肺癌患者的基因表达数据上的应用表明,ECMarker 不仅实现了相对较高的癌症分期预测准确性,而且还鉴定了生物标志物基因和基因网络,暗示了肺癌发展中的调控机制。此外,ECMarker 具有临床可解释性,因为其优先级生物标志物基因可以预测早期肺癌患者的生存率(P 值<0.005)。最后,我们确定了一些目前用于晚期或其他癌症的药物对这些早期肺癌生物标志物的作用,这表明了早期癌症药物的潜在新候选药物。

可用性和实现

ECMarker 是一个开源的通用工具,可在 https://github.com/daifengwanglab/ECMarker 上获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/da43c695244e/btaa935f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/463b6c06ee77/btaa935f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/845878b9f912/btaa935f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/8ec8938c11f5/btaa935f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/c7a2e02dd5c7/btaa935f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/b9f27317b266/btaa935f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/da43c695244e/btaa935f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/463b6c06ee77/btaa935f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/845878b9f912/btaa935f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/8ec8938c11f5/btaa935f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/c7a2e02dd5c7/btaa935f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/b9f27317b266/btaa935f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7848/8150141/da43c695244e/btaa935f5.jpg

相似文献

1
ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages.ECMarker:可解释的机器学习模型,用于识别预测临床结果的基因表达生物标志物,并揭示人类疾病早期的分子机制。
Bioinformatics. 2021 May 23;37(8):1115-1124. doi: 10.1093/bioinformatics/btaa935.
2
DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype-phenotype prediction.DeepGAMI:基于生物学的深度辅助学习的多模态整合与插补方法,以提高基因型-表型预测。
Genome Med. 2023 Oct 31;15(1):88. doi: 10.1186/s13073-023-01248-6.
3
Varmole: a biologically drop-connect deep neural network model for prioritizing disease risk variants and genes.Varmole:一种基于生物滴连的深度神经网络模型,用于优先考虑疾病风险变异和基因。
Bioinformatics. 2021 Jul 19;37(12):1772-1775. doi: 10.1093/bioinformatics/btaa866.
4
NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data.NCC-AUC:一种 AUC 优化方法,用于从基因组和临床数据中识别用于癌症预后的多生物标志物组。
Bioinformatics. 2015 Oct 15;31(20):3330-8. doi: 10.1093/bioinformatics/btv374. Epub 2015 Jun 18.
5
Development of prediction models for one-year brain tumour survival using machine learning: a comparison of accuracy and interpretability.使用机器学习开发脑肿瘤一年生存率预测模型:准确性与可解释性的比较
Comput Methods Programs Biomed. 2023 May;233:107482. doi: 10.1016/j.cmpb.2023.107482. Epub 2023 Mar 13.
6
Complementary feature selection from alternative splicing events and gene expression for phenotype prediction.用于表型预测的可变剪接事件和基因表达的互补特征选择。
Bioinformatics. 2016 Sep 1;32(17):i421-i429. doi: 10.1093/bioinformatics/btw430.
7
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合,以预测放射性肺损伤。
Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.
8
Interpretable machine learning models for hospital readmission prediction: a two-step extracted regression tree approach.可解释的机器学习模型在医院再入院预测中的应用:一种两步提取回归树方法。
BMC Med Inform Decis Mak. 2023 Jun 5;23(1):104. doi: 10.1186/s12911-023-02193-5.
9
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
10
Fast and interpretable genomic data analysis using multiple approximate kernel learning.使用多种近似核学习进行快速且可解释的基因组数据分析。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i77-i83. doi: 10.1093/bioinformatics/btac241.

引用本文的文献

1
BAMBI integrates biostatistical and artificial intelligence methods to improve RNA biomarker discovery.BAMBI整合了生物统计学和人工智能方法,以改进RNA生物标志物的发现。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf073.
2
Cancer gene identification through integrating causal prompting large language model with omics data-driven causal inference.通过将因果提示大语言模型与组学数据驱动的因果推理相结合来识别癌症基因
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf113.
3
Designing interpretable deep learning applications for functional genomics: a quantitative analysis.

本文引用的文献

1
Multiview learning for understanding functional multiomics.多视图学习理解功能多组学。
PLoS Comput Biol. 2020 Apr 2;16(4):e1007677. doi: 10.1371/journal.pcbi.1007677. eCollection 2020 Apr.
2
Introducing a panel for early detection of lung adenocarcinoma by using data integration of genomics, epigenomics, transcriptomics and proteomics.通过基因组学、表观基因组学、转录组学和蛋白质组学数据的综合利用,引入一个用于早期检测肺腺癌的小组。
Exp Mol Pathol. 2020 Feb;112:104360. doi: 10.1016/j.yexmp.2019.104360. Epub 2019 Dec 13.
3
Relevance of Translation Initiation in Diffuse Glioma Biology and its Therapeutic Potential.
设计可解释的深度学习应用于功能基因组学:一项定量分析。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae449.
4
AITeQ: a machine learning framework for Alzheimer's prediction using a distinctive five-gene signature.AITeQ:使用独特的五基因特征进行阿尔茨海默病预测的机器学习框架。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae291.
5
Specific feature recognition on group specific networks (SFR-GSN): a biomarker identification model for cancer stages.基于组特异性网络的特定特征识别(SFR-GSN):一种癌症分期的生物标志物识别模型。
Front Genet. 2024 May 23;15:1407072. doi: 10.3389/fgene.2024.1407072. eCollection 2024.
6
PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies.PheSeq,一种贝叶斯深度学习模型,用于增强和解释基因-疾病关联研究。
Genome Med. 2024 Apr 16;16(1):56. doi: 10.1186/s13073-024-01330-7.
7
Interpretable machine learning for dementia: A systematic review.可解释机器学习在痴呆症中的应用:系统综述。
Alzheimers Dement. 2023 May;19(5):2135-2149. doi: 10.1002/alz.12948. Epub 2023 Feb 3.
8
Identification and validation of a novel prognostic model based on platinum Resistance-related genes in bladder cancer.基于膀胱癌铂类耐药相关基因的新型预后模型的鉴定和验证。
Int Braz J Urol. 2023 Jan-Feb;49(1):61-88. doi: 10.1590/S1677-5538.IBJU.2022.0373.
9
Dynamic cancer drivers: a causal approach for cancer driver discovery based on bio-pathological trajectories.动态癌症驱动因子:基于生物病理轨迹的癌症驱动因子发现的因果方法。
Brief Funct Genomics. 2022 Nov 17;21(6):455-465. doi: 10.1093/bfgp/elac030.
10
Ensemble machine learning model identifies patients with HFpEF from matrix-related plasma biomarkers.基于基质相关血浆生物标志物的集成机器学习模型识别 HFpEF 患者。
Am J Physiol Heart Circ Physiol. 2022 May 1;322(5):H798-H805. doi: 10.1152/ajpheart.00497.2021. Epub 2022 Mar 11.
翻译起始在弥漫性神经胶质瘤生物学及其治疗潜力中的相关性。
Cells. 2019 Nov 29;8(12):1542. doi: 10.3390/cells8121542.
4
Distinct signatures of lung cancer types: aberrant mucin O-glycosylation and compromised immune response.肺癌类型的独特特征:异常的粘蛋白 O-糖基化和受损的免疫反应。
BMC Cancer. 2019 Aug 20;19(1):824. doi: 10.1186/s12885-019-5965-x.
5
g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update).g:Profiler:一个用于功能富集分析和基因列表转换的网络服务器(2019 更新)。
Nucleic Acids Res. 2019 Jul 2;47(W1):W191-W198. doi: 10.1093/nar/gkz369.
6
Epithelial-Mesenchymal Plasticity in Cancer Progression and Metastasis.上皮-间充质可塑性在癌症进展和转移中的作用。
Dev Cell. 2019 May 6;49(3):361-374. doi: 10.1016/j.devcel.2019.04.010.
7
Bulk tissue cell type deconvolution with multi-subject single-cell expression reference.基于多主体单细胞表达参考的组织细胞类型去卷积。
Nat Commun. 2019 Jan 22;10(1):380. doi: 10.1038/s41467-018-08023-x.
8
JAK/STAT inhibition with ruxolitinib enhances oncolytic virotherapy in non-small cell lung cancer models.JAK/STAT 抑制用芦可替尼增强非小细胞肺癌模型中的溶瘤病毒治疗。
Cancer Gene Ther. 2019 Nov;26(11-12):411-418. doi: 10.1038/s41417-018-0074-6. Epub 2019 Jan 9.
9
Comprehensive functional genomic resource and integrative model for the human brain.人类大脑的综合功能基因组资源和整合模型。
Science. 2018 Dec 14;362(6420). doi: 10.1126/science.aat8464.
10
Integrative functional genomic analysis of human brain development and neuropsychiatric risks.人类大脑发育和神经精神风险的综合功能基因组分析。
Science. 2018 Dec 14;362(6420). doi: 10.1126/science.aat7615.