一种用于基因选择的新型正则化最小二乘支持向量回归方法。

A new regularized least squares support vector regression for gene selection.

作者信息

Chen Pei-Chun, Huang Su-Yun, Chen Wei J, Hsiao Chuhsing K

机构信息

1Bioinformatics and Biostatistics Core Laboratory, National Taiwan University, Taipei, Taiwan, Republic of China.

出版信息

BMC Bioinformatics. 2009 Feb 3;10:44. doi: 10.1186/1471-2105-10-44.

DOI:10.1186/1471-2105-10-44

PMID:19187562

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2669483/

Abstract

BACKGROUND

Selection of influential genes with microarray data often faces the difficulties of a large number of genes and a relatively small group of subjects. In addition to the curse of dimensionality, many gene selection methods weight the contribution from each individual subject equally. This equal-contribution assumption cannot account for the possible dependence among subjects who associate similarly to the disease, and may restrict the selection of influential genes.

RESULTS

A novel approach to gene selection is proposed based on kernel similarities and kernel weights. We do not assume uniformity for subject contribution. Weights are calculated via regularized least squares support vector regression (RLS-SVR) of class levels on kernel similarities and are used to weight subject contribution. The cumulative sum of weighted expression levels are next ranked to select responsible genes. These procedures also work for multiclass classification. We demonstrate this algorithm on acute leukemia, colon cancer, small, round blue cell tumors of childhood, breast cancer, and lung cancer studies, using kernel Fisher discriminant analysis and support vector machines as classifiers. Other procedures are compared as well.

CONCLUSION

This approach is easy to implement and fast in computation for both binary and multiclass problems. The gene set provided by the RLS-SVR weight-based approach contains a less number of genes, and achieves a higher accuracy than other procedures.

摘要

背景

利用微阵列数据选择有影响力的基因常常面临基因数量众多而样本量相对较小的困难。除了维数灾难之外，许多基因选择方法对每个个体样本的贡献给予同等的权重。这种等贡献假设无法解释与疾病关联相似的样本之间可能存在的依赖性，并且可能会限制有影响力基因的选择。

结果

提出了一种基于核相似性和核权重的基因选择新方法。我们不假设样本贡献的一致性。通过对核相似性上的类别水平进行正则化最小二乘支持向量回归（RLS-SVR）来计算权重，并用于加权样本贡献。接下来，对加权表达水平的累积和进行排序以选择相关基因。这些步骤也适用于多类分类。我们使用核Fisher判别分析和支持向量机作为分类器，在急性白血病、结肠癌、儿童小圆蓝细胞瘤、乳腺癌和肺癌研究中展示了该算法。同时也比较了其他方法。

结论

这种方法对于二分类和多类问题都易于实现且计算速度快。基于RLS-SVR权重的方法提供的基因集包含的基因数量较少，并且比其他方法具有更高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f9e/2669483/c0a997c3825c/1471-2105-10-44-1.jpg

相似文献

A new regularized least squares support vector regression for gene selection.

BMC Bioinformatics. 2009 Feb 3;10:44. doi: 10.1186/1471-2105-10-44.

Regularized Least Squares Cancer classifiers from DNA microarray data.

BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-6-S4-S2.

A novel gene selection algorithm for cancer classification using microarray datasets.

BMC Med Genomics. 2019 Jan 15;12(1):10. doi: 10.1186/s12920-018-0447-6.

A centroid-based gene selection method for microarray data classification.

J Theor Biol. 2016 Jul 7;400:32-41. doi: 10.1016/j.jtbi.2016.03.034. Epub 2016 Apr 4.

Stable feature selection and classification algorithms for multiclass microarray data.

Biol Direct. 2012 Oct 2;7:33. doi: 10.1186/1745-6150-7-33.

Improving accuracy for cancer classification with a new algorithm for genes selection.

BMC Bioinformatics. 2012 Nov 13;13:298. doi: 10.1186/1471-2105-13-298.

Multiclass molecular cancer classification by kernel subspace methods with effective kernel parameter selection.

J Bioinform Comput Biol. 2005 Oct;3(5):1071-88. doi: 10.1142/s0219720005001491.

Gene selection algorithms for microarray data based on least squares support vector machine.

BMC Bioinformatics. 2006 Feb 27;7:95. doi: 10.1186/1471-2105-7-95.

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.

BMC Bioinformatics. 2008 Jul 22;9:319. doi: 10.1186/1471-2105-9-319.

Multiclass classification of microarray data samples with a reduced number of genes.

BMC Bioinformatics. 2011 Feb 22;12:59. doi: 10.1186/1471-2105-12-59.

引用本文的文献

Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations.

Algorithms Mol Biol. 2012 May 2;7(1):11. doi: 10.1186/1748-7188-7-11.

TANGLE: two-level support vector regression approach for protein backbone torsion angle prediction from primary sequences.

PLoS One. 2012;7(2):e30361. doi: 10.1371/journal.pone.0030361. Epub 2012 Feb 2.

Predicting relapse prior to transplantation in chronic myeloid leukemia by integrating expert knowledge and expression data.

Bioinformatics. 2012 Mar 15;28(6):823-30. doi: 10.1093/bioinformatics/bts059. Epub 2012 Jan 31.

Optimization based tumor classification from microarray gene expression data.

PLoS One. 2011 Feb 4;6(2):e14579. doi: 10.1371/journal.pone.0014579.

Classification of dengue fever patients based on gene expression data using support vector machines.

PLoS One. 2010 Jun 23;5(6):e11267. doi: 10.1371/journal.pone.0011267.

Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method.

J Exp Clin Cancer Res. 2009 Jul 18;28(1):103. doi: 10.1186/1756-9966-28-103.

本文引用的文献

Incremental forward feature selection with application to microarray gene expression data.

J Biopharm Stat. 2008;18(5):827-40. doi: 10.1080/10543400802277868.

A review of feature selection techniques in bioinformatics.

Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.

Training a support vector machine in the primal.

Neural Comput. 2007 May;19(5):1155-78. doi: 10.1162/neco.2007.19.5.1155.

Multi-class cancer classification using multinomial probit regression with Bayesian gene selection.

Syst Biol (Stevenage). 2006 Mar;153(2):70-8. doi: 10.1049/ip-syb:20050015.

A stable gene selection in microarray data analysis.

BMC Bioinformatics. 2006 Apr 27;7:228. doi: 10.1186/1471-2105-7-228.

Gene selection algorithms for microarray data based on least squares support vector machine.

BMC Bioinformatics. 2006 Feb 27;7:95. doi: 10.1186/1471-2105-7-95.

Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes.

BMC Bioinformatics. 2005 Jun 15;6:148. doi: 10.1186/1471-2105-6-148.

An entropy-based gene selection method for cancer classification using microarray data.

BMC Bioinformatics. 2005 Mar 24;6:76. doi: 10.1186/1471-2105-6-76.

Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data.

Bioinformatics. 2005 May 15;21(10):2394-402. doi: 10.1093/bioinformatics/bti319. Epub 2005 Feb 15.

Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage.

Biometrics. 2004 Sep;60(3):812-9. doi: 10.1111/j.0006-341X.2004.00233.x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于基因选择的新型正则化最小二乘支持向量回归方法。

A new regularized least squares support vector regression for gene selection.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献