基于成对基因相互作用的卡方检验进行肿瘤的信息基因选择与直接分类。

Informative gene selection and direct classification of tumor based on Chi-square test of pairwise gene interactions.

作者信息

Zhang Hongyan, Li Lanzhi, Luo Chao, Sun Congwei, Chen Yuan, Dai Zhijun, Yuan Zheming

机构信息

Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization, Changsha 410128, China ; College of Information Science and Technology, Hunan Agricultural University, Changsha 410128, China ; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Changsha 410128, China.

Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization, Changsha 410128, China ; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Changsha 410128, China.

出版信息

Biomed Res Int. 2014;2014:589290. doi: 10.1155/2014/589290. Epub 2014 Jul 23.

DOI:10.1155/2014/589290

PMID:25140319

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4130026/

Abstract

In efforts to discover disease mechanisms and improve clinical diagnosis of tumors, it is useful to mine profiles for informative genes with definite biological meanings and to build robust classifiers with high precision. In this study, we developed a new method for tumor-gene selection, the Chi-square test-based integrated rank gene and direct classifier (χ(2)-IRG-DC). First, we obtained the weighted integrated rank of gene importance from chi-square tests of single and pairwise gene interactions. Then, we sequentially introduced the ranked genes and removed redundant genes by using leave-one-out cross-validation of the chi-square test-based Direct Classifier (χ(2)-DC) within the training set to obtain informative genes. Finally, we determined the accuracy of independent test data by utilizing the genes obtained above with χ(2)-DC. Furthermore, we analyzed the robustness of χ(2)-IRG-DC by comparing the generalization performance of different models, the efficiency of different feature-selection methods, and the accuracy of different classifiers. An independent test of ten multiclass tumor gene-expression datasets showed that χ(2)-IRG-DC could efficiently control overfitting and had higher generalization performance. The informative genes selected by χ(2)-IRG-DC could dramatically improve the independent test precision of other classifiers; meanwhile, the informative genes selected by other feature selection methods also had good performance in χ(2)-DC.

摘要

为了发现疾病机制并改善肿瘤的临床诊断，挖掘具有明确生物学意义的信息基因谱并构建高精度的稳健分类器是很有用的。在本研究中，我们开发了一种新的肿瘤基因选择方法，即基于卡方检验的综合排序基因与直接分类器（χ(2)-IRG-DC）。首先，我们通过单基因和双基因相互作用的卡方检验获得基因重要性的加权综合排序。然后，我们在训练集中使用基于卡方检验的直接分类器（χ(2)-DC）的留一法交叉验证依次引入排序后的基因并去除冗余基因，以获得信息基因。最后，我们利用上述获得的基因通过χ(2)-DC确定独立测试数据的准确性。此外，我们通过比较不同模型的泛化性能、不同特征选择方法的效率以及不同分类器的准确性来分析χ(2)-IRG-DC的稳健性。对十个多类肿瘤基因表达数据集的独立测试表明，χ(2)-IRG-DC可以有效地控制过拟合并具有更高的泛化性能。χ(2)-IRG-DC选择的信息基因可以显著提高其他分类器的独立测试精度；同时，其他特征选择方法选择的信息基因在χ(2)-DC中也具有良好的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80b3/4130026/89a8176472ed/BMRI2014-589290.001.jpg

相似文献

Informative gene selection and direct classification of tumor based on Chi-square test of pairwise gene interactions.

Biomed Res Int. 2014;2014:589290. doi: 10.1155/2014/589290. Epub 2014 Jul 23.

TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection.

BMC Med Genomics. 2013;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1755-8794-6-S1-S3. Epub 2013 Jan 23.

Informative gene selection and the direct classification of tumors based on relative simplicity.

BMC Bioinformatics. 2016 Jan 20;17:44. doi: 10.1186/s12859-016-0893-0.

Stable feature selection and classification algorithms for multiclass microarray data.

Biol Direct. 2012 Oct 2;7:33. doi: 10.1186/1745-6150-7-33.

Improving accuracy for cancer classification with a new algorithm for genes selection.

BMC Bioinformatics. 2012 Nov 13;13:298. doi: 10.1186/1471-2105-13-298.

Accurate molecular classification of cancer using simple rules.

BMC Med Genomics. 2009 Oct 30;2:64. doi: 10.1186/1755-8794-2-64.

Simultaneous gene clustering and subset selection for sample classification via MDL.

Bioinformatics. 2003 Jun 12;19(9):1100-9. doi: 10.1093/bioinformatics/btg039.

An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data.

Comb Chem High Throughput Screen. 2018;21(9):631-645. doi: 10.2174/1386207322666181220124756.

Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data.

PLoS One. 2016 Dec 9;11(12):e0167504. doi: 10.1371/journal.pone.0167504. eCollection 2016.

Feature selection and molecular classification of cancer using genetic programming.

Neoplasia. 2007 Apr;9(4):292-303. doi: 10.1593/neo.07121.

引用本文的文献

MSFC: a new feature construction method for accurate diagnosis of mass spectrometry data.

Sci Rep. 2023 Sep 21;13(1):15694. doi: 10.1038/s41598-023-42395-5.

Chi-MIC-share: a new feature selection algorithm for quantitative structure-activity relationship models.

RSC Adv. 2020 May 27;10(34):19852-19860. doi: 10.1039/d0ra00061b. eCollection 2020 May 26.

Computational advances of tumor marker selection and sample classification in cancer proteomics.

Comput Struct Biotechnol J. 2020 Jul 17;18:2012-2025. doi: 10.1016/j.csbj.2020.07.009. eCollection 2020.

Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data.

Genes Genomics. 2019 Nov;41(11):1301-1313. doi: 10.1007/s13258-019-00859-x. Epub 2019 Aug 19.

MYL6B, a myosin light chain, promotes MDM2-mediated p53 degradation and drives HCC development.

J Exp Clin Cancer Res. 2018 Feb 13;37(1):28. doi: 10.1186/s13046-018-0693-7.

本文引用的文献

TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection.

BMC Med Genomics. 2013;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1755-8794-6-S1-S3. Epub 2013 Jan 23.

A novel approach for biomarker selection and the integration of repeated measures experiments from two assays.

BMC Bioinformatics. 2012 Dec 6;13:325. doi: 10.1186/1471-2105-13-325.

Multiple-platform data integration method with application to combined analysis of microarray and proteomic data.

BMC Bioinformatics. 2012 Dec 2;13:320. doi: 10.1186/1471-2105-13-320.

Improving accuracy for cancer classification with a new algorithm for genes selection.

BMC Bioinformatics. 2012 Nov 13;13:298. doi: 10.1186/1471-2105-13-298.

The top-scoring 'N' algorithm: a generalized relative expression classification method from small numbers of biomolecules.

BMC Bioinformatics. 2012 Sep 11;13:227. doi: 10.1186/1471-2105-13-227.

Interaction-based feature selection and classification for high-dimensional biological data.

Bioinformatics. 2012 Nov 1;28(21):2834-42. doi: 10.1093/bioinformatics/bts531. Epub 2012 Sep 3.

Gene selection and classification for cancer microarray data based on machine learning and similarity measures.

BMC Genomics. 2011 Dec 23;12 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2164-12-S5-S1.

Uniform approximation is more appropriate for Wilcoxon Rank-Sum Test in gene set analysis.

PLoS One. 2012;7(2):e31505. doi: 10.1371/journal.pone.0031505. Epub 2012 Feb 7.

SVM-T-RFE: a novel gene selection algorithm for identifying metastasis-related genes in colorectal cancer using gene expression profiles.

Biochem Biophys Res Commun. 2012 Mar 9;419(2):148-53. doi: 10.1016/j.bbrc.2012.01.087. Epub 2012 Jan 28.

The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures.

PLoS One. 2011;6(12):e28210. doi: 10.1371/journal.pone.0028210. Epub 2011 Dec 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于成对基因相互作用的卡方检验进行肿瘤的信息基因选择与直接分类。

Informative gene selection and direct classification of tumor based on Chi-square test of pairwise gene interactions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献