Suppr
超能文献

基于微阵列数据的拉普拉斯朴素贝叶斯模型均值收缩的生物标志物识别和癌症分类。

Biomarker identification and cancer classification based on microarray data using Laplace naive Bayes model with mean shrinkage.

机构信息

Center for Computer Vision and Department of Mathematics, Sun Yat-Sen University,Guangzhou 510275, China.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1649-62. doi: 10.1109/TCBB.2012.105.

DOI:10.1109/TCBB.2012.105

PMID:22868679

Abstract

Biomarker identification and cancer classification are two closely related problems. In gene expression data sets, the correlation between genes can be high when they share the same biological pathway. Moreover, the gene expression data sets may contain outliers due to either chemical or electrical reasons. A good gene selection method should take group effects into account and be robust to outliers. In this paper, we propose a Laplace naive Bayes model with mean shrinkage (LNB-MS). The Laplace distribution instead of the normal distribution is used as the conditional distribution of the samples for the reasons that it is less sensitive to outliers and has been applied in many fields. The key technique is the L1 penalty imposed on the mean of each class to achieve automatic feature selection. The objective function of the proposed model is a piecewise linear function with respect to the mean of each class, of which the optimal value can be evaluated at the breakpoints simply. An efficient algorithm is designed to estimate the parameters in the model. A new strategy that uses the number of selected features to control the regularization parameter is introduced. Experimental results on simulated data sets and 17 publicly available cancer data sets attest to the accuracy, sparsity, efficiency, and robustness of the proposed algorithm. Many biomarkers identified with our method have been verified in biochemical or biomedical research. The analysis of biological and functional correlation of the genes based on Gene Ontology (GO) terms shows that the proposed method guarantees the selection of highly correlated genes simultaneously

摘要

生物标志物的识别和癌症分类是两个密切相关的问题。在基因表达数据集，当基因共享相同的生物途径时，它们之间的相关性可能很高。此外，由于化学或电气原因，基因表达数据集可能包含异常值。一个好的基因选择方法应该考虑到组效应并且对异常值具有鲁棒性。在本文中，我们提出了一种具有均值收缩的拉普拉斯朴素贝叶斯模型（LNB-MS）。之所以选择拉普拉斯分布而不是正态分布作为样本的条件分布，是因为它对异常值的敏感性较低，并且已经在许多领域得到了应用。关键技术是对每个类别的均值施加 L1 惩罚，以实现自动特征选择。所提出模型的目标函数是关于每个类别的均值的分段线性函数，其最优值可以在断点处简单地评估。设计了一种有效的算法来估计模型中的参数。引入了一种使用所选特征的数量来控制正则化参数的新策略。在模拟数据集和 17 个公开可用的癌症数据集上的实验结果证明了所提出算法的准确性、稀疏性、效率和鲁棒性。我们的方法识别出的许多生物标志物已经在生化或生物医学研究中得到了验证。基于基因本体论（GO）术语对基因的生物和功能相关性的分析表明，该方法可以保证同时选择高度相关的基因。

相似文献

Biomarker identification and cancer classification based on microarray data using Laplace naive Bayes model with mean shrinkage.

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1649-62. doi: 10.1109/TCBB.2012.105.

Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.

Bioinformatics. 2006 Oct 1;22(19):2348-55. doi: 10.1093/bioinformatics/btl386. Epub 2006 Jul 14.

A centroid-based gene selection method for microarray data classification.

J Theor Biol. 2016 Jul 7;400:32-41. doi: 10.1016/j.jtbi.2016.03.034. Epub 2016 Apr 4.

Gene selection for microarray gene expression classification using Bayesian Lasso quantile regression.

Comput Biol Med. 2018 Jun 1;97:145-152. doi: 10.1016/j.compbiomed.2018.04.018. Epub 2018 Apr 27.

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.

BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data.

Bioinformatics. 2005 May 15;21(10):2394-402. doi: 10.1093/bioinformatics/bti319. Epub 2005 Feb 15.

Cancer classification from gene expression data by NPPC ensemble.

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):659-71. doi: 10.1109/TCBB.2010.36.

An efficient statistical feature selection approach for classification of gene expression data.

J Biomed Inform. 2011 Aug;44(4):529-35. doi: 10.1016/j.jbi.2011.01.001. Epub 2011 Jan 15.

Cancer classification and prediction using logistic regression with Bayesian gene selection.

J Biomed Inform. 2004 Aug;37(4):249-59. doi: 10.1016/j.jbi.2004.07.009.

A GMM-IG framework for selecting genes as expression panel biomarkers.

Artif Intell Med. 2010 Feb-Mar;48(2-3):75-82. doi: 10.1016/j.artmed.2009.07.006. Epub 2009 Dec 8.

引用本文的文献

Folded concave penalized learning of high-dimensional MRI data in Parkinson's disease.

J Neurosci Methods. 2021 Jun 1;357:109157. doi: 10.1016/j.jneumeth.2021.109157. Epub 2021 Mar 26.

StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis.

BMC Genomics. 2019 Dec 20;20(Suppl 11):949. doi: 10.1186/s12864-019-6283-z.

An elastic-net logistic regression approach to generate classifiers and gene signatures for types of immune cells and T helper cell subsets.

BMC Bioinformatics. 2019 Aug 22;20(1):433. doi: 10.1186/s12859-019-2994-z.

Integration of 24 Feature Types to Accurately Detect and Predict Seizures Using Scalp EEG Signals.

Sensors (Basel). 2018 Apr 28;18(5):1372. doi: 10.3390/s18051372.

Identification of genes associated with renal cell carcinoma using gene expression profiling analysis.

Oncol Lett. 2016 Jul;12(1):73-78. doi: 10.3892/ol.2016.4573. Epub 2016 May 16.

Folded concave penalized learning in identifying multimodal MRI marker for Parkinson's disease.

J Neurosci Methods. 2016 Aug 1;268:1-6. doi: 10.1016/j.jneumeth.2016.04.016. Epub 2016 Apr 19.

Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification.

PLoS One. 2015 Mar 30;10(3):e0120364. doi: 10.1371/journal.pone.0120364. eCollection 2015.

Cancer subtype discovery and biomarker identification via a new robust network clustering algorithm.

PLoS One. 2013 Jun 17;8(6):e66256. doi: 10.1371/journal.pone.0066256. Print 2013.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

基于微阵列数据的拉普拉斯朴素贝叶斯模型均值收缩的生物标志物识别和癌症分类。

Biomarker identification and cancer classification based on microarray data using Laplace naive Bayes model with mean shrinkage.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译