Suppr超能文献

从 Hallmark 基因集中挖掘具有富集分类潜力的生物标志物基因。

Uncovering biomarker genes with enriched classification potential from Hallmark gene sets.

机构信息

Clemson University, Department of Electrical and Computer Engineering, Clemson, SC, 29634, USA.

Clemson University, Department of Genetics and Biochemistry, Clemson, SC, 29634, USA.

出版信息

Sci Rep. 2019 Jul 5;9(1):9747. doi: 10.1038/s41598-019-46059-1.

Abstract

Given the complex relationship between gene expression and phenotypic outcomes, computationally efficient approaches are needed to sift through large high-dimensional datasets in order to identify biologically relevant biomarkers. In this report, we describe a method of identifying the most salient biomarker genes in a dataset, which we call "candidate genes", by evaluating the ability of gene combinations to classify samples from a dataset, which we call "classification potential". Our algorithm, Gene Oracle, uses a neural network to test user defined gene sets for polygenic classification potential and then uses a combinatorial approach to further decompose selected gene sets into candidate and non-candidate biomarker genes. We tested this algorithm on curated gene sets from the Molecular Signatures Database (MSigDB) quantified in RNAseq gene expression matrices obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) data repositories. First, we identified which MSigDB Hallmark subsets have significant classification potential for both the TCGA and GTEx datasets. Then, we identified the most discriminatory candidate biomarker genes in each Hallmark gene set and provide evidence that the improved biomarker potential of these genes may be due to reduced functional complexity.

摘要

鉴于基因表达和表型结果之间的复杂关系,需要计算效率高的方法来筛选大型高维数据集,以识别具有生物学意义的生物标志物。在本报告中,我们描述了一种通过评估基因组合对数据集样本进行分类的能力(我们称之为“分类潜力”)来识别数据集中最显著的生物标志物基因(我们称之为“候选基因”)的方法。我们的算法 Gene Oracle 使用神经网络来测试用户定义的基因集的多基因分类潜力,然后使用组合方法将选定的基因集进一步分解为候选和非候选生物标志物基因。我们在从癌症基因组图谱 (TCGA) 和基因型组织表达 (GTEx) 数据存储库获得的 RNAseq 基因表达矩阵中量化的来自分子特征数据库 (MSigDB) 的经过策展的基因集中测试了此算法。首先,我们确定了哪些 MSigDB 特征子集对 TCGA 和 GTEx 数据集都具有重要的分类潜力。然后,我们确定了每个特征基因集中最具区分性的候选生物标志物基因,并提供证据表明这些基因的生物标志物潜力提高可能是由于功能复杂性降低所致。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dac/6611793/cfa090d15450/41598_2019_46059_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验