基于旋转森林的分类器集成构建,以提高机器学习算法的医学诊断性能。

Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms.

机构信息

University of Gaziantep, Gaziantep Vocational School of Higher Education, Computer Programming Division, Gaziantep, Turkey.

出版信息

Comput Methods Programs Biomed. 2011 Dec;104(3):443-51. doi: 10.1016/j.cmpb.2011.03.018. Epub 2011 Apr 30.

Abstract

Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature. While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC). Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases. RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems.

摘要

提高机器学习算法的准确性对于设计高性能计算机辅助诊断 (CADx) 系统至关重要。研究表明,通过集成分类策略可以提高基础分类器的性能。在本研究中,我们构建了 30 种机器学习算法的旋转森林 (RF) 集成分类器,以使用文献中的帕金森病、糖尿病和心脏病数据评估它们的分类性能。在进行实验时,首先使用基于相关性的特征选择 (CFS) 算法降低三个数据集的特征维度。其次,计算三个数据集上 30 种机器学习算法的分类性能。然后,基于 RF 算法构建 30 个分类器集成,以评估各自的分类器在相同疾病数据下的性能。所有实验均采用留一验证策略,使用三个指标评估 60 种算法的性能;分类准确率 (ACC)、kappa 误差 (KE) 和接收器工作特征曲线下的面积 (AUC)。基础分类器对糖尿病、心脏和帕金森氏病数据集的平均准确率分别为 72.15%、77.52%和 84.43%。对于 RF 分类器集成,它们对各自的疾病产生了 74.47%、80.49%和 87.13%的平均准确率。RF 是一种新提出的分类器集成算法,可用于提高各种机器学习算法的准确性,以设计先进的 CADx 系统。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索