• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于小型高维基因表达数据集的新型混合降维技术,使用信息复杂度准则进行癌症分类。

A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification.

作者信息

Pamukçu Esra, Bozdogan Hamparsum, Çalık Sinan

机构信息

Department of Statistics, Faculty of Science, Firat University, 23119 Elazig, Turkey.

Department of Business Analytics and Statistics, The University of Tennessee, Knoxville, TN 37996, USA.

出版信息

Comput Math Methods Med. 2015;2015:370640. doi: 10.1155/2015/370640. Epub 2015 Feb 19.

DOI:10.1155/2015/370640
PMID:25838836
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4370236/
Abstract

Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions.

摘要

基因表达数据通常规模庞大、复杂且噪声极高。其维度很高,包含数千个基因(即特征),但观测值(即样本)数量有限。尽管经典主成分分析(PCA)方法被广泛用作降维和监督与非监督分类的首个标准步骤,但在涉及小样本的数据集情况下,它存在若干缺点,因为样本协方差矩阵会退化并变得奇异。在本文中,我们在概率主成分分析(PPCA)的背景下解决这些局限性,通过引入并开发一种使用最大熵协方差矩阵及其混合平滑协方差估计器的全新方法。为了降低数据维度并选择要保留的概率主成分(PPC)数量,我们进一步引入并开发了著名的赤池信息准则(AIC)、一致赤池信息准则(CAIC)以及博兹多根的复杂度信息论度量(ICOMP)准则。分析了六个公开可用的小样本基准数据集,以展示我们使用混合平滑协方差矩阵估计器的方法的实用性、灵活性和通用性,该方法不会退化以执行PPCA来降低维度并对高维癌症组进行监督分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9f0/4370236/623049b141ae/CMMM2015-370640.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9f0/4370236/9f69b4c5ff84/CMMM2015-370640.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9f0/4370236/623049b141ae/CMMM2015-370640.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9f0/4370236/9f69b4c5ff84/CMMM2015-370640.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9f0/4370236/623049b141ae/CMMM2015-370640.002.jpg

相似文献

1
A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification.一种用于小型高维基因表达数据集的新型混合降维技术,使用信息复杂度准则进行癌症分类。
Comput Math Methods Med. 2015;2015:370640. doi: 10.1155/2015/370640. Epub 2015 Feb 19.
2
Robust PCA and classification in biosciences.生物科学中的鲁棒主成分分析与分类
Bioinformatics. 2004 Jul 22;20(11):1728-36. doi: 10.1093/bioinformatics/bth158. Epub 2004 Feb 26.
3
Classification for high-throughput data with an optimal subset of principal components.利用主成分的最优子集对高通量数据进行分类。
Comput Biol Chem. 2009 Oct;33(5):408-13. doi: 10.1016/j.compbiolchem.2009.07.017. Epub 2009 Aug 18.
4
Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类
BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.
5
Bayesian supervised dimensionality reduction.贝叶斯监督降维。
IEEE Trans Cybern. 2013 Dec;43(6):2179-89. doi: 10.1109/TCYB.2013.2245321.
6
A granular computing approach to gene selection.一种用于基因选择的粒度计算方法。
Biomed Mater Eng. 2014;24(1):1307-14. doi: 10.3233/BME-130933.
7
A comparative investigation on subspace dimension determination.子空间维度确定的比较研究。
Neural Netw. 2004 Oct-Nov;17(8-9):1051-9. doi: 10.1016/j.neunet.2004.07.005.
8
Minimizing nearest neighbor classification error for nonparametric dimension reduction.最小化非参数降维的最近邻分类错误。
IEEE Trans Neural Netw Learn Syst. 2014 Aug;25(8):1588-94. doi: 10.1109/TNNLS.2013.2294547.
9
Applications of support vector machines to cancer classification with microarray data.支持向量机在利用微阵列数据进行癌症分类中的应用。
Int J Neural Syst. 2005 Dec;15(6):475-84. doi: 10.1142/S0129065705000396.
10
Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.基于独立成分分析的惩罚判别方法用于利用基因表达数据进行肿瘤分类
Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.

引用本文的文献

1
Clinical application of modified bag-of-features coupled with hybrid neural-based classifier in dengue fever classification using gene expression data.基于基因表达数据的改良特征袋与混合神经分类器在登革热分类中的临床应用。
Med Biol Eng Comput. 2018 Apr;56(4):709-720. doi: 10.1007/s11517-017-1722-y. Epub 2017 Sep 11.
2
Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data.基于模糊粗糙集的F信息与人类基因表达数据的水漩涡算法相结合自动检测癌症相关基因
PLoS One. 2016 Dec 9;11(12):e0167504. doi: 10.1371/journal.pone.0167504. eCollection 2016.

本文引用的文献

1
Probabilistic principal component analysis for metabolomic data.代谢组学数据的概率主成分分析。
BMC Bioinformatics. 2010 Nov 23;11:571. doi: 10.1186/1471-2105-11-571.
2
Identification of differential gene pathways with principal component analysis.通过主成分分析识别差异基因通路。
Bioinformatics. 2009 Apr 1;25(7):882-9. doi: 10.1093/bioinformatics/btp085. Epub 2009 Feb 17.
3
Principal component analysis of native ensembles of biomolecular structures (PCA_NEST): insights into functional dynamics.生物分子结构天然集合的主成分分析(PCA_NEST):对功能动力学的见解
Bioinformatics. 2009 Mar 1;25(5):606-14. doi: 10.1093/bioinformatics/btp023. Epub 2009 Jan 15.
4
Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes.用于具有连续或生存结局的微阵列数据基因集富集的监督主成分分析。
Bioinformatics. 2008 Nov 1;24(21):2474-81. doi: 10.1093/bioinformatics/btn458. Epub 2008 Aug 27.
5
Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.基于独立成分分析的惩罚判别方法用于利用基因表达数据进行肿瘤分类
Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.
6
BagBoosting for tumor classification with gene expression data.用于基于基因表达数据的肿瘤分类的BagBoosting算法
Bioinformatics. 2004 Dec 12;20(18):3583-93. doi: 10.1093/bioinformatics/bth447. Epub 2004 Oct 5.
7
Gene expression correlates of clinical prostate cancer behavior.临床前列腺癌行为的基因表达相关性
Cancer Cell. 2002 Mar;1(2):203-9. doi: 10.1016/s1535-6108(02)00030-2.
8
Prediction of central nervous system embryonal tumour outcome based on gene expression.基于基因表达的中枢神经系统胚胎性肿瘤预后预测
Nature. 2002 Jan 24;415(6870):436-42. doi: 10.1038/415436a.
9
Principal component analysis for clustering gene expression data.用于聚类基因表达数据的主成分分析。
Bioinformatics. 2001 Sep;17(9):763-74. doi: 10.1093/bioinformatics/17.9.763.
10
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.利用基因表达谱和人工神经网络进行癌症的分类与诊断预测。
Nat Med. 2001 Jun;7(6):673-9. doi: 10.1038/89044.