Huang Kaizhu, Yang Haiqin, King Irwin, Lyu Michael R
Information Technology Laboratory, Fujitsu Research and Development Center Co., Ltd., Beijing 100016, China.
IEEE Trans Biomed Eng. 2006 May;53(5):821-31. doi: 10.1109/TBME.2006.872819.
The challenging task of medical diagnosis based on machine learning techniques requires an inherent bias, i.e., the diagnosis should favor the "ill" class over the "healthy" class, since misdiagnosing a patient as a healthy person may delay the therapy and aggravate the illness. Therefore, the objective in this task is not to improve the overall accuracy of the classification, but to focus on improving the sensitivity (the accuracy of the "ill" class) while maintaining an acceptable specificity (the accuracy of the "healthy" class). Some current methods adopt roundabout ways to impose a certain bias toward the important class, i.e., they try to utilize some intermediate factors to influence the classification. However, it remains uncertain whether these methods can improve the classification performance systematically. In this paper, by engaging a novel learning tool, the biased minimax probability machine (BMPM), we deal with the issue in a more elegant way and directly achieve the objective of appropriate medical diagnosis. More specifically, the BMPM directly controls the worst case accuracies to incorporate a bias toward the "ill" class. Moreover, in a distribution-free way, the BMPM derives the decision rule in such a way as to maximize the worst case sensitivity while maintaining an acceptable worst case specificity. By directly controlling the accuracies, the BMPM provides a more rigorous way to handle medical diagnosis; by deriving a distribution-free decision rule, the BMPM distinguishes itself from a large family of classifiers, namely, the generative classifiers, where an assumption on the data distribution is necessary. We evaluate the performance of the model and compare it with three traditional classifiers: the k-nearest neighbor, the naive Bayesian, and the C4.5. The test results on two medical datasets, the breast-cancer dataset and the heart disease dataset, show that the BMPM outperforms the other three models.
基于机器学习技术的医学诊断这一具有挑战性的任务需要一种内在偏差,即诊断应更倾向于“患病”类别而非“健康”类别,因为将患者误诊为健康人可能会延误治疗并加重病情。因此,此任务的目标并非提高分类的整体准确率,而是专注于提高敏感性(“患病”类别的准确率),同时保持可接受的特异性(“健康”类别的准确率)。当前一些方法采用迂回方式对重要类别施加一定偏差,即试图利用一些中间因素来影响分类。然而,这些方法是否能系统地提高分类性能仍不确定。在本文中,通过使用一种新颖的学习工具——有偏极小极大概率机(BMPM),我们以更巧妙的方式处理该问题,并直接实现适当医学诊断的目标。更具体地说,BMPM直接控制最坏情况准确率,以纳入对“患病”类别的偏差。此外,BMPM以无分布的方式推导决策规则,以便在保持可接受的最坏情况特异性的同时最大化最坏情况敏感性。通过直接控制准确率,BMPM提供了一种更严格的方式来处理医学诊断;通过推导无分布决策规则,BMPM区别于一大类分类器,即生成式分类器,在生成式分类器中对数据分布进行假设是必要的。我们评估了该模型的性能,并将其与三种传统分类器进行比较:k近邻、朴素贝叶斯和C4.5。在乳腺癌数据集和心脏病数据集这两个医学数据集上的测试结果表明,BMPM优于其他三种模型。