Diamantidis N, Giakoumakis E A
Informatics Department, Athens University of Economics and Business, Greece.
Artif Intell Med. 1996 Oct;8(5):505-14. doi: 10.1016/S0933-3657(96)00357-0.
Inductive learning algorithms are powerful tools for the extraction of knowledge from data. Their success in medical domains is well-known. In medical diagnosis domains and generally in real-world applications among other problems, inductive learning algorithms have to deal with unknown values. In most cases unknown values are treated as missing ones, i.e. unknown values which are related to the class of training examples, but are missing due to lack of measurements. In this paper we address the problem of don't care values, which are unknown, because they are irrelevant to the class of the examples. The distinction of don't care values and missing ones is important in medical domains. With this distinction the experts are able to relate each diagnosis to the appropriate subset of attributes. We present techniques for dealing efficiently with don't care values in the induction of decision trees. Furthermore, we examine the importance of the distinction between missing and don't care values and we investigate the existence of don't care values instead of missing ones, in medical and non-medical real-world datasets.
归纳学习算法是从数据中提取知识的强大工具。它们在医学领域的成功是众所周知的。在医学诊断领域以及一般在实际应用中,除了其他问题外,归纳学习算法还必须处理未知值。在大多数情况下,未知值被视为缺失值,即与训练示例类别相关但由于缺乏测量而缺失的未知值。在本文中,我们解决了无关值的问题,这些值是未知的,因为它们与示例类别无关。在医学领域,区分无关值和缺失值很重要。通过这种区分,专家能够将每个诊断与适当的属性子集相关联。我们提出了在决策树归纳中有效处理无关值的技术。此外,我们研究了区分缺失值和无关值的重要性,并调查了在医学和非医学实际数据集里存在无关值而非缺失值的情况。