ANMM4CBR：一种用于基因表达数据分类的基于案例的推理方法。

ANMM4CBR: a case-based reasoning method for gene expression data classification.

作者信息

Yao Bangpeng, Li Shao

机构信息

MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, PR China.

出版信息

Algorithms Mol Biol. 2010 Jan 6;5:14. doi: 10.1186/1748-7188-5-14.

DOI:10.1186/1748-7188-5-14

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2843690/

Abstract

BACKGROUND

Accurate classification of microarray data is critical for successful clinical diagnosis and treatment. The "curse of dimensionality" problem and noise in the data, however, undermines the performance of many algorithms.

METHOD

In order to obtain a robust classifier, a novel Additive Nonparametric Margin Maximum for Case-Based Reasoning (ANMM4CBR) method is proposed in this article. ANMM4CBR employs a case-based reasoning (CBR) method for classification. CBR is a suitable paradigm for microarray analysis, where the rules that define the domain knowledge are difficult to obtain because usually only a small number of training samples are available. Moreover, in order to select the most informative genes, we propose to perform feature selection via additively optimizing a nonparametric margin maximum criterion, which is defined based on gene pre-selection and sample clustering. Our feature selection method is very robust to noise in the data.

RESULTS

The effectiveness of our method is demonstrated on both simulated and real data sets. We show that the ANMM4CBR method performs better than some state-of-the-art methods such as support vector machine (SVM) and k nearest neighbor (kNN), especially when the data contains a high level of noise.

AVAILABILITY

The source code is attached as an additional file of this paper.

摘要

背景

微阵列数据的准确分类对于成功的临床诊断和治疗至关重要。然而，“维数灾难”问题和数据中的噪声会削弱许多算法的性能。

方法

为了获得一个鲁棒的分类器，本文提出了一种新颖的基于案例推理的加性非参数边际最大化（ANMM4CBR）方法。ANMM4CBR采用基于案例推理（CBR）方法进行分类。CBR是微阵列分析的合适范式，在微阵列分析中，由于通常只有少量训练样本可用，定义领域知识的规则很难获得。此外，为了选择最具信息性的基因，我们建议通过加性优化基于基因预选择和样本聚类定义的非参数边际最大化准则来进行特征选择。我们的特征选择方法对数据中的噪声非常鲁棒。

结果

我们的方法在模拟数据集和真实数据集上均得到了验证。我们表明，ANMM4CBR方法比一些先进方法（如支持向量机（SVM）和k近邻（kNN））表现更好，尤其是当数据包含高水平噪声时。

可用性

源代码作为本文的附加文件附上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/05ad/2843690/f3a2bcd70dc5/1748-7188-5-14-1.jpg

相似文献

1

ANMM4CBR: a case-based reasoning method for gene expression data classification.

Algorithms Mol Biol. 2010 Jan 6;5:14. doi: 10.1186/1748-7188-5-14.

2

A New Filter Approach Based on Effective Ranges for Classification of Gene Expression Data.

Big Data. 2024 Aug;12(4):312-330. doi: 10.1089/big.2022.0086. Epub 2023 Sep 4.

3

Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.

BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

4

Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.

BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.

5

Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm.

Math Biosci Eng. 2022 Sep 19;19(12):13747-13781. doi: 10.3934/mbe.2022641.

6

Feature weight estimation for gene selection: a local hyperlinear learning approach.

BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.

7

Improving PLS-RFE based gene selection for microarray data classification.

Comput Biol Med. 2015 Jul;62:14-24. doi: 10.1016/j.compbiomed.2015.04.011. Epub 2015 Apr 17.

8

EKNN: Ensemble classifier incorporating connectivity and density into kNN with application to cancer diagnosis.

Artif Intell Med. 2021 Jan;111:101985. doi: 10.1016/j.artmed.2020.101985. Epub 2020 Nov 8.

9

Comparison of feature selection and classification for MALDI-MS data.

BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2164-10-S1-S3.

10

Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data.

BMC Bioinformatics. 2006 Apr 10;7:197. doi: 10.1186/1471-2105-7-197.

引用本文的文献

1

MIDClass: microarray data classification by association rules and gene expression intervals.

PLoS One. 2013 Aug 6;8(8):e69873. doi: 10.1371/journal.pone.0069873. Print 2013.

2

An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes.

Algorithms Mol Biol. 2013 Apr 24;8(1):15. doi: 10.1186/1748-7188-8-15.

本文引用的文献

1

Nonparametric discriminant analysis.

IEEE Trans Pattern Anal Mach Intell. 1983 Jun;5(6):671-8. doi: 10.1109/tpami.1983.4767461.

2

Using uncorrelated discriminant analysis for tissue classification with gene expression data.

IEEE/ACM Trans Comput Biol Bioinform. 2004 Oct-Dec;1(4):181-90. doi: 10.1109/TCBB.2004.45.

3

Instance-based concept learning from multiclass DNA microarray data.

BMC Bioinformatics. 2006 Feb 16;7:73. doi: 10.1186/1471-2105-7-73.

4

Structured polychotomous machine diagnosis of multiple cancer types using gene expression.

Bioinformatics. 2006 Apr 15;22(8):950-8. doi: 10.1093/bioinformatics/btl029. Epub 2006 Feb 1.

5

Robust and accurate cancer classification with gene expression profiling.

Proc IEEE Comput Syst Bioinform Conf. 2005:310-21. doi: 10.1109/csb.2005.49.

6

Minimum redundancy feature selection from microarray gene expression data.

J Bioinform Comput Biol. 2005 Apr;3(2):185-205. doi: 10.1142/s0219720005001004.

7

Boosting for tumor classification with gene expression data.

Bioinformatics. 2003 Jun 12;19(9):1061-9. doi: 10.1093/bioinformatics/btf867.

8

Nonparametric methods for identifying differentially expressed genes in microarray data.

Bioinformatics. 2002 Nov;18(11):1454-61. doi: 10.1093/bioinformatics/18.11.1454.

9

A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments.

Bioinformatics. 2002 Apr;18(4):546-54. doi: 10.1093/bioinformatics/18.4.546.

10

Multiclass cancer diagnosis using tumor gene expression signatures.

Proc Natl Acad Sci U S A. 2001 Dec 18;98(26):15149-54. doi: 10.1073/pnas.211566398. Epub 2001 Dec 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。