Abdel-Aal R E
Department of Computer Engineering, King Fahd University of Petroleum and Minerals, P.O. Box 1759, KFUPM, Dhahran 31261, Saudi Arabia.
Comput Methods Programs Biomed. 2005 Nov;80(2):141-53. doi: 10.1016/j.cmpb.2005.08.001. Epub 2005 Sep 19.
This paper demonstrates the use of abductive network classifier committees trained on different features for improving classification accuracy in medical diagnosis. In an earlier publication, committee members were trained on different subsets of the training set to ensure enough diversity for improved committee performance. In situations characterized by high data dimensionality, i.e. a large number of features and a relatively few training examples, it may be more advantageous to split the feature set rather than the training set. We describe a novel approach for tentatively ranking the features and forming subsets of uniform predictive quality for training individual members. The abductive network training algorithm is used to select optimum predictors from the feature set at various levels of model complexity specified by the user. Using the resulting tentative ranking, the features are grouped into mutually exclusive subsets of approximately equal predictive power for training the members. The approach is demonstrated on three standard medical diagnosis datasets (breast cancer, heart disease, and diabetes). Three-member committees trained on different feature subsets and using simple output combination methods reduce classification errors by up to 20% compared to the best single model developed with the full feature set. Results are compared with those reported previously with members trained through splitting the training set. Training abductive committee members on feature subsets of approximately equal predictive power achieves both diversity and quality for improved committee performance. Ensemble feature subset selection can be performed using GMDH-based learning algorithms. The approach should be advantageous in situations characterized by high data dimensionality.
本文展示了在不同特征上训练的溯因网络分类器委员会在提高医学诊断分类准确性方面的应用。在早期的一篇论文中,委员会成员是在训练集的不同子集上进行训练的,以确保足够的多样性来提高委员会的性能。在以高数据维度为特征的情况下,即大量特征和相对较少的训练示例,划分特征集而非训练集可能更具优势。我们描述了一种新颖的方法,用于初步对特征进行排序,并形成具有统一预测质量的子集来训练各个成员。溯因网络训练算法用于在用户指定的不同模型复杂度水平下,从特征集中选择最优预测器。利用得到的初步排序,将特征分组为预测能力大致相等的相互排斥的子集,用于训练成员。该方法在三个标准医学诊断数据集(乳腺癌、心脏病和糖尿病)上进行了演示。与使用完整特征集开发的最佳单一模型相比,在不同特征子集上训练并使用简单输出组合方法的三人委员会可将分类错误降低多达20%。将结果与之前通过划分训练集训练成员所报告的结果进行了比较。在预测能力大致相等的特征子集上训练溯因委员会成员,可实现多样性和质量,从而提高委员会的性能。可以使用基于GMDH的学习算法进行集成特征子集选择。该方法在以高数据维度为特征的情况下应该具有优势。