• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用非负主成分分析改进基因表达癌症分子模式发现

Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis.

作者信息

Han Xiaoxu

机构信息

Department of Mathematics and Bioinformatics Program, Eastern Michigan University, Ypsilanti, MI 48197, USA.

出版信息

Genome Inform. 2008;21:200-11.

PMID:19425159
Abstract

Robust cancer molecular pattern identification from microarray data not only plays an essential role in modern clinic oncology, but also presents a challenge for statistical learning. Although principal component analysis (PCA) is a widely used feature selection algorithm in microarray analysis, its holistic mechanism prevents it from capturing the latent local data structure in the following cancer molecular pattern identification. In this study, we investigate the benefit of enforcing non-negativity constraints on principal component analysis (PCA) and propose a nonnegative principal component (NPCA) based classification algorithm in cancer molecular pattern analysis for gene expression data. This novel algorithm conducts classification by classifying meta-samples of input cancer data by support vector machines (SVM) or other classic supervised learning algorithms. The meta-samples are low-dimensional projections of original cancer samples in a purely additive meta-gene subspace generated from the NPCA-induced nonnegative matrix factorization (NMF). We report strongly leading classification results from NPCA-SVM algorithm in the cancer molecular pattern identification for five benchmark gene expression datasets under 100 trials of 50% hold-out cross validations and leave one out cross validations. We demonstrate superiority of NPCA-SVM algorithm by direct comparison with seven classification algorithms: SVM, PCA-SVM, KPCA-SVM, NMF-SVM, LLE-SVM, PCA-LDA and k-NN, for the five cancer datasets in classification rates, sensitivities and specificities. Our NPCA-SVM algorithm overcomes the over-fitting problem associative with SVM-based classifications for gene expression data under a Gaussian kernel. As a more robust high-performance classifier, NPCA-SVM can be used to replace the general SVM and k-NN classifiers in cancer biomarker discovery to capture more meaningful oncogenes.

摘要

从微阵列数据中稳健地识别癌症分子模式不仅在现代临床肿瘤学中起着至关重要的作用,而且对统计学习也提出了挑战。尽管主成分分析(PCA)是微阵列分析中广泛使用的特征选择算法,但其整体机制使其在后续的癌症分子模式识别中无法捕捉潜在的局部数据结构。在本研究中,我们探讨了对主成分分析(PCA)施加非负约束的益处,并提出了一种基于非负主成分(NPCA)的分类算法用于基因表达数据的癌症分子模式分析。这种新算法通过支持向量机(SVM)或其他经典监督学习算法对输入癌症数据的元样本进行分类来进行分类。元样本是原始癌症样本在由NPCA诱导的非负矩阵分解(NMF)生成的纯加法元基因子空间中的低维投影。我们报告了在50%留出交叉验证和留一法交叉验证的100次试验中,NPCA - SVM算法在五个基准基因表达数据集的癌症分子模式识别中取得了显著领先的分类结果。通过与七种分类算法:SVM、PCA - SVM、KPCA - SVM、NMF - SVM、LLE - SVM、PCA - LDA和k - NN直接比较,我们证明了NPCA - SVM算法在五个癌症数据集的分类率、敏感性和特异性方面的优越性。我们的NPCA - SVM算法克服了基于SVM的高斯核基因表达数据分类中存在的过拟合问题。作为一种更稳健的高性能分类器,NPCA - SVM可用于替代癌症生物标志物发现中的通用SVM和k - NN分类器,以捕获更有意义的癌基因。

相似文献

1
Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis.使用非负主成分分析改进基因表达癌症分子模式发现
Genome Inform. 2008;21:200-11.
2
Nonnegative principal component analysis for cancer molecular pattern discovery.基于非负主成分分析的癌症分子模式发现。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):537-49. doi: 10.1109/TCBB.2009.36.
3
A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.一种用于从癌组织基因表达数据中进行特征选择和规则提取的多核支持向量机方案。
Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.
4
CARSVM: a class association rule-based classification framework and its application to gene expression data.CARSVM:一种基于类关联规则的分类框架及其在基因表达数据中的应用。
Artif Intell Med. 2008 Sep;44(1):7-25. doi: 10.1016/j.artmed.2008.05.002. Epub 2008 Jun 30.
5
Metagenes and molecular pattern discovery using matrix factorization.使用矩阵分解发现元基因和分子模式。
Proc Natl Acad Sci U S A. 2004 Mar 23;101(12):4164-9. doi: 10.1073/pnas.0308531101. Epub 2004 Mar 11.
6
Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.
7
Tumor clustering using nonnegative matrix factorization with gene selection.使用带基因选择的非负矩阵分解进行肿瘤聚类。
IEEE Trans Inf Technol Biomed. 2009 Jul;13(4):599-607. doi: 10.1109/TITB.2009.2018115. Epub 2009 Apr 14.
8
Gene selection for classification of cancers using probabilistic model building genetic algorithm.使用概率模型构建遗传算法进行癌症分类的基因选择
Biosystems. 2005 Dec;82(3):208-25. doi: 10.1016/j.biosystems.2005.07.003. Epub 2005 Aug 22.
9
Tumor classification based on non-negative matrix factorization using gene expression data.基于基因表达数据的非负矩阵分解的肿瘤分类。
IEEE Trans Nanobioscience. 2011 Jun;10(2):86-93. doi: 10.1109/TNB.2011.2144998. Epub 2011 Jul 7.
10
Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.基于最大间隔准则的递归基因选择:与支持向量机递归特征消除法的比较
BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

引用本文的文献

1
Multi-view manifold regularized compact low-rank representation for cancer samples clustering on multi-omics data.基于多组学数据的癌症样本聚类的多视图流形正则化紧致低秩表示
BMC Bioinformatics. 2022 Jan 20;22(Suppl 12):334. doi: 10.1186/s12859-021-04220-6.
2
Diagnostic biases in translational bioinformatics.转化生物信息学中的诊断偏差。
BMC Med Genomics. 2015 Aug 1;8:46. doi: 10.1186/s12920-015-0116-y.
3
Identification of biomarkers that distinguish chemical contaminants based on gene expression profiles.基于基因表达谱鉴定区分化学污染物的生物标志物。
BMC Genomics. 2014 Mar 31;15:248. doi: 10.1186/1471-2164-15-248.