一种用于从癌组织基因表达数据中进行特征选择和规则提取的多核支持向量机方案。

A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

作者信息

Chen Zhenyu, Li Jianping, Wei Liwei

机构信息

Institute of Policy & Management, Chinese Academy of Sciences, Beijing 100080, China.

出版信息

Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.

DOI:10.1016/j.artmed.2007.07.008

PMID:17851055

Abstract

OBJECTIVE

Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM.

MATERIAL AND METHODS

A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity.

RESULTS AND CONCLUSION

Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.

摘要

目的

最近，使用微阵列技术进行基因表达谱分析已被证明是一种改善癌症诊断和治疗的有前途的工具。基因表达数据包含高水平的噪声，并且相对于可用样本数量而言基因数量众多。这给机器学习和统计技术带来了巨大挑战。支持向量机（SVM）已成功用于对癌组织的基因表达数据进行分类。在医学领域，向用户提供透明的决策过程至关重要。如何解释计算出的解决方案并呈现提取的知识成为SVM的主要障碍。

材料与方法

提出了一种由特征选择、规则提取和预测建模组成的多核支持向量机（MK-SVM）方案，以提高SVM的解释能力。在该方案中，我们表明特征选择问题可以转化为一个普通的多参数学习问题。并且提出了一种收缩方法：基于1-范数的线性规划，以获得稀疏参数和相应的所选特征。我们提出了一种新颖的规则提取方法，利用分离超平面和支持向量提供的信息来提高规则的泛化能力和可理解性，并降低计算复杂度。

结果与结论

使用两个公共基因表达数据集：白血病数据集和结肠肿瘤数据集来证明该方法的性能。利用少量选定的基因，MK-SVM取得了令人鼓舞的分类准确率：两个数据集均超过90%。此外，提取了带有语言标签的非常简单的规则。由于其良好的分类性能，这些规则集具有很高的诊断能力。

相似文献

A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.

CARSVM: a class association rule-based classification framework and its application to gene expression data.

Artif Intell Med. 2008 Sep;44(1):7-25. doi: 10.1016/j.artmed.2008.05.002. Epub 2008 Jun 30.

Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis.

Genome Inform. 2008;21:200-11.

Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.

BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

Mixture classification model based on clinical markers for breast cancer prognosis.

Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.

Predictive neural networks for gene expression data analysis.

Neural Netw. 2005 Apr;18(3):297-306. doi: 10.1016/j.neunet.2005.01.003. Epub 2005 Apr 21.

Tumor classification ranking from microarray data.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Incremental forward feature selection with application to microarray gene expression data.

J Biopharm Stat. 2008;18(5):827-40. doi: 10.1080/10543400802277868.

ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data.

Biosystems. 2007 Sep-Oct;90(2):516-28. doi: 10.1016/j.biosystems.2006.12.003. Epub 2006 Dec 16.

Gene selection from microarray data for cancer classification--a machine learning approach.

Comput Biol Chem. 2005 Feb;29(1):37-46. doi: 10.1016/j.compbiolchem.2004.11.001.

引用本文的文献

Machine learning-based Diagnostic model for determining the etiology of pleural effusion using Age, ADA and LDH.

Respir Res. 2025 May 2;26(1):170. doi: 10.1186/s12931-025-03253-2.

A bibliometric and visual analysis of publications on artificial intelligence in colorectal cancer (2002-2022).

Front Oncol. 2023 Feb 7;13:1077539. doi: 10.3389/fonc.2023.1077539. eCollection 2023.

Artificial Intelligence Applied to Battery Research: Hype or Reality?

Chem Rev. 2022 Jun 22;122(12):10899-10969. doi: 10.1021/acs.chemrev.1c00108. Epub 2021 Sep 16.

Artificial Intelligence Pipeline to Bridge the Gap between Bench Researchers and Clinical Researchers in Precision Medicine.

Med One. 2020 Jan 10;5. doi: 10.20900/mo20200001.

Refining wet lab experiments with in silico searches: A rational quest for diagnostic peptides in visceral leishmaniasis.

PLoS Negl Trop Dis. 2019 May 6;13(5):e0007353. doi: 10.1371/journal.pntd.0007353. eCollection 2019 May.

Intelligent Techniques Using Molecular Data Analysis in Leukaemia: An Opportunity for Personalized Medicine Support System.

Biomed Res Int. 2017;2017:3587309. doi: 10.1155/2017/3587309. Epub 2017 Jul 25.

A feature selection method based on multiple kernel learning with expression profiles of different types.

BioData Min. 2017 Feb 2;10:4. doi: 10.1186/s13040-017-0124-x. eCollection 2017.

An Introduction to B-Cell Epitope Mapping and In Silico Epitope Prediction.

J Immunol Res. 2016;2016:6760830. doi: 10.1155/2016/6760830. Epub 2016 Dec 29.

Contribution of bioinformatics prediction in microRNA-based cancer therapeutics.

Adv Drug Deliv Rev. 2015 Jan;81:94-103. doi: 10.1016/j.addr.2014.10.030. Epub 2014 Nov 6.

Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data.

Theor Biol Med Model. 2014 May 7;11 Suppl 1(Suppl 1):S7. doi: 10.1186/1742-4682-11-S1-S7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于从癌组织基因表达数据中进行特征选择和规则提取的多核支持向量机方案。

A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIAL AND METHODS

RESULTS AND CONCLUSION

目的

材料与方法

结果与结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献