Suppr超能文献

基于相对简易性的信息性基因选择与肿瘤的直接分类

Informative gene selection and the direct classification of tumors based on relative simplicity.

作者信息

Chen Yuan, Wang Lifeng, Li Lanzhi, Zhang Hongyan, Yuan Zheming

机构信息

Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Changsha, China.

Hunan Provincial Key Laboratory for Germplasm Innovation and Utilization of Crop, Hunan Agricultural University, Changsha, China.

出版信息

BMC Bioinformatics. 2016 Jan 20;17:44. doi: 10.1186/s12859-016-0893-0.

Abstract

BACKGROUND

Selecting a parsimonious set of informative genes to build highly generalized performance classifier is the most important task for the analysis of tumor microarray expression data. Many existing gene pair evaluation methods cannot highlight diverse patterns of gene pairs only used one strategy of vertical comparison and horizontal comparison, while individual-gene-ranking method ignores redundancy and synergy among genes.

RESULTS

Here we proposed a novel score measure named relative simplicity (RS). We evaluated gene pairs according to integrating vertical comparison with horizontal comparison, finally built RS-based direct classifier (RS-based DC) based on a set of informative genes capable of binary discrimination with a paired votes strategy. Nine multi-class gene expression datasets involving human cancers were used to validate the performance of new method. Compared with the nine reference models, RS-based DC received the highest average independent test accuracy (91.40%), the best generalization performance and the smallest informative average gene number (20.56). Compared with the four reference feature selection methods, RS also received the highest average test accuracy in three classifiers (Naïve Bayes, k-Nearest Neighbor and Support Vector Machine), and only RS can improve the performance of SVM.

CONCLUSIONS

Diverse patterns of gene pairs could be highlighted more fully while integrating vertical comparison with horizontal comparison strategy. DC core classifier can effectively control over-fitting. RS-based feature selection method combined with DC classifier can lead to more robust selection of informative genes and classification accuracy.

摘要

背景

选择一组简洁的信息基因来构建高度通用的性能分类器是肿瘤微阵列表达数据分析的最重要任务。许多现有的基因对评估方法仅采用垂直比较和水平比较中的一种策略,无法突出基因对的多样模式,而单基因排名方法则忽略了基因间的冗余和协同作用。

结果

在此我们提出了一种名为相对简洁性(RS)的新型评分度量。我们通过将垂直比较与水平比较相结合来评估基因对,最终基于一组能够通过配对投票策略进行二元区分的信息基因构建了基于RS的直接分类器(RS-based DC)。使用九个涉及人类癌症的多类基因表达数据集来验证新方法的性能。与九个参考模型相比,基于RS的DC获得了最高的平均独立测试准确率(91.40%)、最佳的泛化性能和最小的信息平均基因数(20.56)。与四种参考特征选择方法相比,RS在三个分类器(朴素贝叶斯、k近邻和支持向量机)中也获得了最高的平均测试准确率,并且只有RS能够提高支持向量机的性能。

结论

将垂直比较与水平比较策略相结合可以更充分地突出基因对的多样模式。DC核心分类器可以有效控制过拟合。基于RS的特征选择方法与DC分类器相结合可以导致更稳健地选择信息基因和提高分类准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1eb/4721022/8d5368dcdbcf/12859_2016_893_Fig1_HTML.jpg

相似文献

1
Informative gene selection and the direct classification of tumors based on relative simplicity.
BMC Bioinformatics. 2016 Jan 20;17:44. doi: 10.1186/s12859-016-0893-0.
2
TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection.
BMC Med Genomics. 2013;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1755-8794-6-S1-S3. Epub 2013 Jan 23.
3
Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification.
Genomics Proteomics Bioinformatics. 2017 Dec;15(6):389-395. doi: 10.1016/j.gpb.2017.08.002. Epub 2017 Dec 12.
4
Improving accuracy for cancer classification with a new algorithm for genes selection.
BMC Bioinformatics. 2012 Nov 13;13:298. doi: 10.1186/1471-2105-13-298.
5
6
An efficient statistical feature selection approach for classification of gene expression data.
J Biomed Inform. 2011 Aug;44(4):529-35. doi: 10.1016/j.jbi.2011.01.001. Epub 2011 Jan 15.
8
Feature weight estimation for gene selection: a local hyperlinear learning approach.
BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.
9
A novel gene selection algorithm for cancer classification using microarray datasets.
BMC Med Genomics. 2019 Jan 15;12(1):10. doi: 10.1186/s12920-018-0447-6.
10
A comparison of methods for three-class mammograms classification.
Technol Health Care. 2017 Aug 9;25(4):657-670. doi: 10.3233/THC-160805.

引用本文的文献

1
Novel ratio-expressions of genes enables estimation of wound age in contused skeletal muscle.
Int J Legal Med. 2024 Jan;138(1):197-206. doi: 10.1007/s00414-023-03095-x. Epub 2023 Oct 7.
4
A Wrapper Feature Subset Selection Method Based on Randomized Search and Multilayer Structure.
Biomed Res Int. 2019 Nov 4;2019:9864213. doi: 10.1155/2019/9864213. eCollection 2019.
5
A fast approach to detect gene-gene synergy.
Sci Rep. 2017 Nov 27;7(1):16437. doi: 10.1038/s41598-017-16748-w.
6
A Computational Method of Defining Potential Biomarkers based on Differential Sub-Networks.
Sci Rep. 2017 Oct 30;7(1):14339. doi: 10.1038/s41598-017-14682-5.

本文引用的文献

1
Galectin-1 triggers epithelial-mesenchymal transition in human hepatocellular carcinoma cells.
J Cell Physiol. 2015 Jun;230(6):1298-309. doi: 10.1002/jcp.24865.
4
Role of the Wilms' tumor 1 gene in the aberrant biological behavior of leukemic cells and the related mechanisms.
Oncol Rep. 2014 Dec;32(6):2680-6. doi: 10.3892/or.2014.3529. Epub 2014 Oct 6.
5
Expansion of NK cells and reduction of NKG2D expression in chronic lymphocytic leukemia. Correlation with progressive disease.
PLoS One. 2014 Oct 6;9(10):e108326. doi: 10.1371/journal.pone.0108326. eCollection 2014.
6
Gene expression profiling identifies IRF4-associated molecular signatures in hematological malignancies.
PLoS One. 2014 Sep 10;9(9):e106788. doi: 10.1371/journal.pone.0106788. eCollection 2014.
8
9
Aberrant promoter methylation of PPP1R3C and EFHD1 in plasma of colorectal cancer patients.
Cancer Med. 2014 Oct;3(5):1235-45. doi: 10.1002/cam4.273. Epub 2014 May 24.
10
Molecular insights into NF2/Merlin tumor suppressor function.
FEBS Lett. 2014 Aug 19;588(16):2743-52. doi: 10.1016/j.febslet.2014.04.001. Epub 2014 Apr 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验