从 Hallmark 基因集中挖掘具有富集分类潜力的生物标志物基因。

Uncovering biomarker genes with enriched classification potential from Hallmark gene sets.

机构信息

Clemson University, Department of Electrical and Computer Engineering, Clemson, SC, 29634, USA.

Clemson University, Department of Genetics and Biochemistry, Clemson, SC, 29634, USA.

出版信息

Sci Rep. 2019 Jul 5;9(1):9747. doi: 10.1038/s41598-019-46059-1.

DOI:10.1038/s41598-019-46059-1

PMID:31278367

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6611793/

Abstract

Given the complex relationship between gene expression and phenotypic outcomes, computationally efficient approaches are needed to sift through large high-dimensional datasets in order to identify biologically relevant biomarkers. In this report, we describe a method of identifying the most salient biomarker genes in a dataset, which we call "candidate genes", by evaluating the ability of gene combinations to classify samples from a dataset, which we call "classification potential". Our algorithm, Gene Oracle, uses a neural network to test user defined gene sets for polygenic classification potential and then uses a combinatorial approach to further decompose selected gene sets into candidate and non-candidate biomarker genes. We tested this algorithm on curated gene sets from the Molecular Signatures Database (MSigDB) quantified in RNAseq gene expression matrices obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) data repositories. First, we identified which MSigDB Hallmark subsets have significant classification potential for both the TCGA and GTEx datasets. Then, we identified the most discriminatory candidate biomarker genes in each Hallmark gene set and provide evidence that the improved biomarker potential of these genes may be due to reduced functional complexity.

摘要

鉴于基因表达和表型结果之间的复杂关系，需要计算效率高的方法来筛选大型高维数据集，以识别具有生物学意义的生物标志物。在本报告中，我们描述了一种通过评估基因组合对数据集样本进行分类的能力（我们称之为“分类潜力”）来识别数据集中最显著的生物标志物基因（我们称之为“候选基因”）的方法。我们的算法 Gene Oracle 使用神经网络来测试用户定义的基因集的多基因分类潜力，然后使用组合方法将选定的基因集进一步分解为候选和非候选生物标志物基因。我们在从癌症基因组图谱 (TCGA) 和基因型组织表达 (GTEx) 数据存储库获得的 RNAseq 基因表达矩阵中量化的来自分子特征数据库 (MSigDB) 的经过策展的基因集中测试了此算法。首先，我们确定了哪些 MSigDB 特征子集对 TCGA 和 GTEx 数据集都具有重要的分类潜力。然后，我们确定了每个特征基因集中最具区分性的候选生物标志物基因，并提供证据表明这些基因的生物标志物潜力提高可能是由于功能复杂性降低所致。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dac/6611793/cfa090d15450/41598_2019_46059_Fig1_HTML.jpg

相似文献

Uncovering biomarker genes with enriched classification potential from Hallmark gene sets.从 Hallmark 基因集中挖掘具有富集分类潜力的生物标志物基因。

Sci Rep. 2019 Jul 5;9(1):9747. doi: 10.1038/s41598-019-46059-1.

Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.基于基因集富集分析的 clear cell 肾细胞癌基因表达分析用于生物统计学管理。

BJU Int. 2011 Jul;108(2 Pt 2):E29-35. doi: 10.1111/j.1464-410X.2010.09794.x. Epub 2011 Mar 16.

Identification of aberrantly methylated differentially expressed genes in breast cancer by integrated bioinformatics analysis.整合生物信息学分析鉴定乳腺癌中异常甲基化差异表达基因。

J Cell Biochem. 2019 Sep;120(9):16229-16243. doi: 10.1002/jcb.28904. Epub 2019 May 12.

Identification of core genes and outcomes in hepatocellular carcinoma by bioinformatics analysis.基于生物信息学分析鉴定肝细胞癌的核心基因和预后标志物。

J Cell Biochem. 2019 Jun;120(6):10069-10081. doi: 10.1002/jcb.28290. Epub 2018 Dec 7.

GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.GSNFS：肺癌表达数据的基因子网生物标志物识别

BMC Med Genomics. 2016 Dec 5;9(Suppl 3):70. doi: 10.1186/s12920-016-0231-4.

CancerLivER: a database of liver cancer gene expression resources and biomarkers.CancerLivER：肝癌基因表达资源和生物标志物数据库。

Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa012.

Identification of Hepatocellular Carcinoma-Related Potential Genes and Pathways Through Bioinformatic-Based Analyses.通过基于生物信息学的分析鉴定肝细胞癌相关潜在基因和通路

Genet Test Mol Biomarkers. 2019 Nov;23(11):766-777. doi: 10.1089/gtmb.2019.0063. Epub 2019 Oct 18.

Computational identification of biomarker genes for lung cancer considering treatment and non-treatment studies.考虑治疗和非治疗研究的肺癌生物标志物基因的计算识别。

BMC Bioinformatics. 2020 Dec 3;21(Suppl 9):218. doi: 10.1186/s12859-020-3524-8.

Computational analysis of fused co-expression networks for the identification of candidate cancer gene biomarkers.基于融合共表达网络的计算分析鉴定候选癌症基因生物标志物。

NPJ Syst Biol Appl. 2021 Mar 12;7(1):17. doi: 10.1038/s41540-021-00175-9.

Biomarker identification and cancer classification based on microarray data using Laplace naive Bayes model with mean shrinkage.基于微阵列数据的拉普拉斯朴素贝叶斯模型均值收缩的生物标志物识别和癌症分类。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1649-62. doi: 10.1109/TCBB.2012.105.

引用本文的文献

HAPIR: a refined Hallmark gene set-based machine learning approach for predicting immunotherapy response in cancer patients.HAPIR：一种基于改进的标志性基因集的机器学习方法，用于预测癌症患者的免疫治疗反应。

NPJ Precis Oncol. 2025 Jun 18;9(1):194. doi: 10.1038/s41698-025-00992-9.

GEMDiff: a diffusion workflow bridges between normal and tumor gene expression states: a breast cancer case study.GEMDiff：一种连接正常与肿瘤基因表达状态的扩散工作流程：乳腺癌案例研究

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf093.

GSFM: A genome-scale functional module transformation to represent drug efficacy for drug discovery.GSFM：一种用于药物发现的、代表药物疗效的基因组规模功能模块转化。

Acta Pharm Sin B. 2025 Jan;15(1):133-150. doi: 10.1016/j.apsb.2024.08.017. Epub 2024 Aug 24.

DeepGSEA: explainable deep gene set enrichment analysis for single-cell transcriptomic data.DeepGSEA：单细胞转录组数据的可解释深度基因集富集分析。

Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae434.

Mitofusin-2 enhances cervical cancer progression through Wnt/β-catenin signaling.线粒体融合蛋白 2 通过 Wnt/β-连环蛋白信号通路促进宫颈癌的进展。

BMB Rep. 2024 Apr;57(4):194-199. doi: 10.5483/BMBRep.2023-0205.

Palovarotene Action Against Heterotopic Ossification Includes a Reduction of Local Participating Activin A-Expressing Cell Populations.帕罗维罗汀针对异位骨化的作用包括减少局部表达激活素A的参与细胞群体。

JBMR Plus. 2023 Oct 19;7(12):e10821. doi: 10.1002/jbm4.10821. eCollection 2023 Dec.

Comprehensive drug response profiling and pan-omic analysis identified therapeutic candidates and prognostic biomarkers for Asian cholangiocarcinoma.全面的药物反应谱分析和泛组学分析确定了亚洲胆管癌的治疗候选药物和预后生物标志物。

iScience. 2022 Sep 23;25(10):105182. doi: 10.1016/j.isci.2022.105182. eCollection 2022 Oct 21.

Simulating the restoration of normal gene expression from different thyroid cancer stages using deep learning.利用深度学习模拟不同甲状腺癌阶段正常基因表达的恢复。

BMC Cancer. 2022 Jun 4;22(1):612. doi: 10.1186/s12885-022-09704-z.

Identification of condition-specific regulatory mechanisms in normal and cancerous human lung tissue.鉴定正常和癌变人体肺组织中的特定条件调控机制。

BMC Genomics. 2022 May 6;23(1):350. doi: 10.1186/s12864-022-08591-9.

A Comprehensive Risk Assessment and Stratification Model of Papillary Thyroid Carcinoma Based on the Autophagy-Related LncRNAs.基于自噬相关长链非编码RNA的甲状腺乳头状癌综合风险评估与分层模型

Front Oncol. 2022 Feb 24;11:771556. doi: 10.3389/fonc.2021.771556. eCollection 2021.

本文引用的文献

Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes.五种人类肿瘤类型的分类揭示了特定的生物标志物和背景分类基因。

Sci Rep. 2018 May 25;8(1):8180. doi: 10.1038/s41598-018-26310-x.

Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer.起源细胞模式主导了 33 种癌症类型的 10000 个肿瘤的分子分类。

Cell. 2018 Apr 5;173(2):291-304.e6. doi: 10.1016/j.cell.2018.03.022.

Ensembl 2018.Ensembl 2018.

Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761. doi: 10.1093/nar/gkx1098.

The Reactome Pathway Knowledgebase.Reactome 通路知识库。

Nucleic Acids Res. 2018 Jan 4;46(D1):D649-D655. doi: 10.1093/nar/gkx1132.

KEGG: new perspectives on genomes, pathways, diseases and drugs.京都基因与基因组百科全书（KEGG）：关于基因组、通路、疾病和药物的新视角。

Nucleic Acids Res. 2017 Jan 4;45(D1):D353-D361. doi: 10.1093/nar/gkw1092. Epub 2016 Nov 28.

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.基于 HISAT、StringTie 和 Ballgown 的 RNA-seq 实验的转录本水平表达分析。

Nat Protoc. 2016 Sep;11(9):1650-67. doi: 10.1038/nprot.2016.095. Epub 2016 Aug 11.

Applications of Deep Learning in Biomedicine.深度学习在生物医学中的应用。

Mol Pharm. 2016 May 2;13(5):1445-54. doi: 10.1021/acs.molpharmaceut.5b00982. Epub 2016 Mar 29.

Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma.分子分析揭示弥漫性胶质瘤的生物学离散亚群和进展途径。

Cell. 2016 Jan 28;164(3):550-63. doi: 10.1016/j.cell.2015.12.028.

A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project.一种高质量尸检组织采集的新方法：基因型-组织表达（GTEx）项目

Biopreserv Biobank. 2015 Oct;13(5):311-9. doi: 10.1089/bio.2015.0032.

Inferring pathway dysregulation in cancers from multiple types of omic data.从多种组学数据推断癌症中的信号通路失调

Genome Med. 2015 Jun 26;7(1):61. doi: 10.1186/s13073-015-0189-4. eCollection 2015.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从 Hallmark 基因集中挖掘具有富集分类潜力的生物标志物基因。

Uncovering biomarker genes with enriched classification potential from Hallmark gene sets.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献