Suppr超能文献

一种用于从癌组织基因表达数据中进行特征选择和规则提取的多核支持向量机方案。

A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

作者信息

Chen Zhenyu, Li Jianping, Wei Liwei

机构信息

Institute of Policy & Management, Chinese Academy of Sciences, Beijing 100080, China.

出版信息

Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.

Abstract

OBJECTIVE

Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM.

MATERIAL AND METHODS

A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity.

RESULTS AND CONCLUSION

Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.

摘要

目的

最近,使用微阵列技术进行基因表达谱分析已被证明是一种改善癌症诊断和治疗的有前途的工具。基因表达数据包含高水平的噪声,并且相对于可用样本数量而言基因数量众多。这给机器学习和统计技术带来了巨大挑战。支持向量机(SVM)已成功用于对癌组织的基因表达数据进行分类。在医学领域,向用户提供透明的决策过程至关重要。如何解释计算出的解决方案并呈现提取的知识成为SVM的主要障碍。

材料与方法

提出了一种由特征选择、规则提取和预测建模组成的多核支持向量机(MK-SVM)方案,以提高SVM的解释能力。在该方案中,我们表明特征选择问题可以转化为一个普通的多参数学习问题。并且提出了一种收缩方法:基于1-范数的线性规划,以获得稀疏参数和相应的所选特征。我们提出了一种新颖的规则提取方法,利用分离超平面和支持向量提供的信息来提高规则的泛化能力和可理解性,并降低计算复杂度。

结果与结论

使用两个公共基因表达数据集:白血病数据集和结肠肿瘤数据集来证明该方法的性能。利用少量选定的基因,MK-SVM取得了令人鼓舞的分类准确率:两个数据集均超过90%。此外,提取了带有语言标签的非常简单的规则。由于其良好的分类性能,这些规则集具有很高的诊断能力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验