• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于知识的矩阵分解基因表达分类

Knowledge-based gene expression classification via matrix factorization.

作者信息

Schachtner R, Lutter D, Knollmüller P, Tomé A M, Theis F J, Schmitz G, Stetter M, Vilda P Gómez, Lang E W

机构信息

CIML/Biophysics, University of Regensburg, D-93040 Regensburg, Germany.

出版信息

Bioinformatics. 2008 Aug 1;24(15):1688-97. doi: 10.1093/bioinformatics/btn245. Epub 2008 Jun 5.

DOI:10.1093/bioinformatics/btn245
PMID:18535085
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2638868/
Abstract

MOTIVATION

Modern machine learning methods based on matrix decomposition techniques, like independent component analysis (ICA) or non-negative matrix factorization (NMF), provide new and efficient analysis tools which are currently explored to analyze gene expression profiles. These exploratory feature extraction techniques yield expression modes (ICA) or metagenes (NMF). These extracted features are considered indicative of underlying regulatory processes. They can as well be applied to the classification of gene expression datasets by grouping samples into different categories for diagnostic purposes or group genes into functional categories for further investigation of related metabolic pathways and regulatory networks.

RESULTS

In this study we focus on unsupervised matrix factorization techniques and apply ICA and sparse NMF to microarray datasets. The latter monitor the gene expression levels of human peripheral blood cells during differentiation from monocytes to macrophages. We show that these tools are able to identify relevant signatures in the deduced component matrices and extract informative sets of marker genes from these gene expression profiles. The methods rely on the joint discriminative power of a set of marker genes rather than on single marker genes. With these sets of marker genes, corroborated by leave-one-out or random forest cross-validation, the datasets could easily be classified into related diagnostic categories. The latter correspond to either monocytes versus macrophages or healthy vs Niemann Pick C disease patients.

摘要

动机

基于矩阵分解技术的现代机器学习方法,如独立成分分析(ICA)或非负矩阵分解(NMF),提供了新的高效分析工具,目前正被用于分析基因表达谱。这些探索性特征提取技术产生表达模式(ICA)或元基因(NMF)。这些提取的特征被认为是潜在调控过程的指示。它们也可应用于基因表达数据集的分类,通过将样本分组到不同类别以用于诊断目的,或将基因分组到功能类别以进一步研究相关代谢途径和调控网络。

结果

在本研究中,我们专注于无监督矩阵分解技术,并将ICA和稀疏NMF应用于微阵列数据集。这些数据集监测人类外周血细胞从单核细胞分化为巨噬细胞过程中的基因表达水平。我们表明,这些工具能够在推导的成分矩阵中识别相关特征,并从这些基因表达谱中提取信息丰富的标记基因集。这些方法依赖于一组标记基因的联合判别能力,而不是单个标记基因。通过留一法或随机森林交叉验证得到这些标记基因集后,数据集可以很容易地被分类到相关的诊断类别中。后者对应于单核细胞与巨噬细胞,或健康人与尼曼-皮克C病患者。

相似文献

1
Knowledge-based gene expression classification via matrix factorization.基于知识的矩阵分解基因表达分类
Bioinformatics. 2008 Aug 1;24(15):1688-97. doi: 10.1093/bioinformatics/btn245. Epub 2008 Jun 5.
2
Comparison of unsupervised and supervised gene selection methods.无监督和有监督基因选择方法的比较。
Annu Int Conf IEEE Eng Med Biol Soc. 2008;2008:5212-5. doi: 10.1109/IEMBS.2008.4650389.
3
Exploring matrix factorization techniques for significant genes identification of Alzheimer's disease microarray gene expression data.探索矩阵分解技术在阿尔茨海默病基因表达数据中显著基因识别中的应用。
BMC Bioinformatics. 2011;12 Suppl 5(Suppl 5):S7. doi: 10.1186/1471-2105-12-S5-S7. Epub 2011 Jul 27.
4
How to extract marker genes from microarray data sets.如何从微阵列数据集中提取标记基因。
Annu Int Conf IEEE Eng Med Biol Soc. 2007;2007:4215-8. doi: 10.1109/IEMBS.2007.4353266.
5
Biclustering of gene expression data by Non-smooth Non-negative Matrix Factorization.基于非光滑非负矩阵分解的基因表达数据双聚类分析
BMC Bioinformatics. 2006 Feb 17;7:78. doi: 10.1186/1471-2105-7-78.
6
Cancer molecular pattern discovery by subspace consensus kernel classification.基于子空间共识核分类的癌症分子模式发现
Comput Syst Bioinformatics Conf. 2007;6:55-65.
7
Tumor classification based on non-negative matrix factorization using gene expression data.基于基因表达数据的非负矩阵分解的肿瘤分类。
IEEE Trans Nanobioscience. 2011 Jun;10(2):86-93. doi: 10.1109/TNB.2011.2144998. Epub 2011 Jul 7.
8
Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study.半监督非负矩阵分解在基因表达解卷积中的应用:案例研究。
Infect Genet Evol. 2012 Jul;12(5):913-21. doi: 10.1016/j.meegid.2011.08.014. Epub 2011 Sep 10.
9
Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets.在来自多个数据集的基因表达数据上鉴定和验证的候选生物标志物集的预测潜力。
BMC Bioinformatics. 2007 Oct 26;8:415. doi: 10.1186/1471-2105-8-415.
10
Orthogonal joint sparse NMF for microarray data analysis.用于微阵列数据分析的正交联合稀疏非负矩阵分解
J Math Biol. 2019 Jul;79(1):223-247. doi: 10.1007/s00285-019-01355-2. Epub 2019 Apr 19.

引用本文的文献

1
Independent component analysis based gene co-expression network inference (ICAnet) to decipher functional modules for better single-cell clustering and batch integration.基于独立成分分析的基因共表达网络推断 (ICAnet) 以破译功能模块,从而更好地进行单细胞聚类和批次整合。
Nucleic Acids Res. 2021 May 21;49(9):e54. doi: 10.1093/nar/gkab089.
2
SITC cancer immunotherapy resource document: a compass in the land of biomarker discovery.SITC 癌症免疫治疗资源文件:生物标志物发现领域的指南针。
J Immunother Cancer. 2020 Dec;8(2). doi: 10.1136/jitc-2020-000705.
3
K1 and K15 of Kaposi's Sarcoma-Associated Herpesvirus Are Partial Functional Homologues of Latent Membrane Protein 2A of Epstein-Barr Virus.卡波西肉瘤相关疱疹病毒的K1和K15是爱泼斯坦-巴尔病毒潜伏膜蛋白2A的部分功能同源物。
J Virol. 2015 Jul;89(14):7248-61. doi: 10.1128/JVI.00839-15. Epub 2015 May 6.
4
Statistical methods for the analysis of high-throughput metabolomics data.用于高通量代谢组学数据分析的统计方法。
Comput Struct Biotechnol J. 2013 Mar 22;4:e201301009. doi: 10.5936/csbj.201301009. eCollection 2013.
5
Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices.使用基于压缩矩阵的非负矩阵分解从DNA拷贝数数据中发现患者亚组。
PLoS One. 2013 Nov 20;8(11):e79720. doi: 10.1371/journal.pone.0079720. eCollection 2013.
6
iPcc: a novel feature extraction method for accurate disease class discovery and prediction.iPcc:一种用于准确发现和预测疾病类别的新型特征提取方法。
Nucleic Acids Res. 2013 Aug;41(14):e143. doi: 10.1093/nar/gkt343. Epub 2013 Jun 12.
7
Configurable pattern-based evolutionary biclustering of gene expression data.基于模式的基因表达数据可配置进化双聚类
Algorithms Mol Biol. 2013 Feb 23;8(1):4. doi: 10.1186/1748-7188-8-4.
8
Co-clustering phenome-genome for phenotype classification and disease gene discovery.表型-基因组联合聚类用于表型分类和疾病基因发现。
Nucleic Acids Res. 2012 Oct;40(19):e146. doi: 10.1093/nar/gks615. Epub 2012 Jun 26.
9
Comprehensive evaluation of matrix factorization methods for the analysis of DNA microarray gene expression data.基于 DNA 微阵列基因表达数据的矩阵分解方法分析的综合评估。
BMC Bioinformatics. 2011;12 Suppl 13(Suppl 13):S8. doi: 10.1186/1471-2105-12-S13-S8. Epub 2011 Nov 30.
10
A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels.一种基于参考的混合物模型,用于根据蛋白质和/或基因表达水平自动选择疾病分类的成分。
BMC Bioinformatics. 2011 Dec 30;12:496. doi: 10.1186/1471-2105-12-496.

本文引用的文献

1
Analyzing M-CSF dependent monocyte/macrophage differentiation: expression modes and meta-modes derived from an independent component analysis.分析M-CSF依赖的单核细胞/巨噬细胞分化:基于独立成分分析的表达模式和元模式
BMC Bioinformatics. 2008 Feb 17;9:100. doi: 10.1186/1471-2105-9-100.
2
Routes to identify marker genes for microarray classification.用于微阵列分类识别标记基因的途径。
Annu Int Conf IEEE Eng Med Biol Soc. 2007;2007:4617-20. doi: 10.1109/IEMBS.2007.4353368.
3
I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data.I/NI-要求排除无信息基因:一种用于微阵列数据的高效筛选工具。
Bioinformatics. 2007 Nov 1;23(21):2897-902. doi: 10.1093/bioinformatics/btm478. Epub 2007 Oct 5.
4
GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest.GeneSrF和varSelRF:一个用于基因选择和分类的基于网络的工具及R包,采用随机森林方法。
BMC Bioinformatics. 2007 Sep 3;8:328. doi: 10.1186/1471-2105-8-328.
5
A distribution free summarization method for Affymetrix GeneChip arrays.一种用于Affymetrix基因芯片阵列的无分布汇总方法。
Bioinformatics. 2007 Feb 1;23(3):321-7. doi: 10.1093/bioinformatics/btl609. Epub 2006 Dec 5.
6
A new summarization method for Affymetrix probe level data.一种针对Affymetrix探针水平数据的新汇总方法。
Bioinformatics. 2006 Apr 15;22(8):943-9. doi: 10.1093/bioinformatics/btl033. Epub 2006 Feb 10.
7
Gene selection and classification of microarray data using random forest.使用随机森林进行微阵列数据的基因选择与分类
BMC Bioinformatics. 2006 Jan 6;7:3. doi: 10.1186/1471-2105-7-3.
8
Microarray data analysis: from disarray to consolidation and consensus.微阵列数据分析:从混乱到整合与共识。
Nat Rev Genet. 2006 Jan;7(1):55-65. doi: 10.1038/nrg1749.
9
Gene expression data classification with Kernel principal component analysis.基于核主成分分析的基因表达数据分类
J Biomed Biotechnol. 2005 Jun 30;2005(2):155-9. doi: 10.1155/JBB.2005.155.
10
Independent component analysis of microarray data in the study of endometrial cancer.子宫内膜癌研究中微阵列数据的独立成分分析
Oncogene. 2004 Aug 26;23(39):6677-83. doi: 10.1038/sj.onc.1207562.