Suppr超能文献

一种用于微阵列表达数据分析的基于随机森林的新型特征选择方法。

A novel random forests-based feature selection method for microarray expression data analysis.

作者信息

Yao Dengju, Yang Jing, Zhan Xiaojuan, Zhan Xiaorong, Xie Zhiqiang

出版信息

Int J Data Min Bioinform. 2015;13(1):84-101. doi: 10.1504/ijdmb.2015.070852.

Abstract

High-dimensional data and a large number of redundancy features in bioinformatics research have created an urgent need for feature selection. In this paper, a novel random forests-based feature selection method is proposed that adopts the idea of stratifying feature space and combines generalised sequence backward searching and generalised sequence forward searching strategies. A random forest variable importance score is used to rank features, and different classifiers are used as a feature subset evaluating function. The proposed method is examined on five microarray expression datasets, including leukaemia, prostate, breast, nervous and DLBCL, and the average accuracies of the SVM classifier in these datasets are 100%, 95.24%, 85%, 91.67%, and 91.67%, respectively. The results show that the proposed method could not only improve the classification accuracy but also greatly reduce the computation time of the feature selection process.

摘要

生物信息学研究中的高维数据和大量冗余特征催生了对特征选择的迫切需求。本文提出了一种基于随机森林的新型特征选择方法,该方法采用特征空间分层的思想,结合广义序列后向搜索和广义序列前向搜索策略。使用随机森林变量重要性得分对特征进行排序,并使用不同的分类器作为特征子集评估函数。在包括白血病、前列腺癌、乳腺癌、神经和弥漫性大B细胞淋巴瘤在内的五个微阵列表达数据集上对所提出的方法进行了检验,这些数据集中支持向量机分类器的平均准确率分别为100%、95.24%、85%、91.67%和91.67%。结果表明,所提出的方法不仅可以提高分类准确率,还可以大大减少特征选择过程的计算时间。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验