决策树和自动学习在现实世界医学决策中的局限性。

Kokol P, Zorman M, Stiglic M M, Malèiae I

Center of Medical Informatics, University of Maribor, FERI, Slovenia.

Stud Health Technol Inform. 1998;52 Pt 1:529-33.

The decision tree approach is one of the most common approaches in automatic learning and decision making. It is popular for its simplicity in constructing, efficient use in decision making and for simple representation, which is easily understood by humans. The automatic learning of decision trees and their use usually show very good results in various "theoretical" environments. The training sets are usually large enough for learning algorithm to construct a hypothesis consistent with the underlying concept. But in real life it is often impossible to find the desired number of training objects for various reasons. The lack of possibilities to measure attribute values, high cost and complexity of such measurements, unavailability of all attributes at the same time are the typical representatives. There are different ways to deal with some of these problems, but in a delicate field of medical decision making, we cannot allow ourselves to make any inaccurate decisions. We have measured the values of 24 attributes before and after the 82 operations of children in age between 2 and 10 years. The aim was to find the dependencies between attribute values and a child's predisposition to acidemia--the decrease of blood's pH. Our main interest was in discovering predisposition to two forms of acidosis, the metabolic acidosis and the respiratory acidosis, which can both have serious effects on child's health. We decided to construct different decision trees from a set of training objects, which was complete (there were no missing attribute values), but on the other hand not large enough to avoid the effect of overfitting. A common approach to evaluation of a decision tree is the use of a test set. In our case we decided that instead of using a test set, we ask medical experts to take a closer look at the generated trees. They examined and evaluated the decision trees branch by branch. Their comments on the generated trees can be found in this paper. The comments show, that trees generated from available training set mainly have surprisingly good branches, but on the other hand some are very "stupid" and no medical explanation could be found. Thereafter we can conclude, that the decision tree concept and automatic learning can be successfully used in real world situations, constrained with the real world limitations, but they should be used only with the guidelines of appropriate medical experts.

决策树方法是自动学习和决策中最常用的方法之一。它因其构建简单、决策时使用高效以及表示简单（易于人类理解）而广受欢迎。决策树的自动学习及其应用通常在各种“理论”环境中显示出非常好的效果。训练集通常足够大，以便学习算法构建与潜在概念一致的假设。但在现实生活中，由于各种原因，往往无法找到所需数量的训练对象。无法测量属性值、此类测量的高成本和复杂性、无法同时获取所有属性是典型代表。有不同的方法来处理其中一些问题，但在医学决策这个微妙的领域，我们不能做出任何不准确的决策。我们测量了2至10岁儿童82次手术前后24个属性的值。目的是找出属性值与儿童酸血症易感性（血液pH值降低）之间的相关性。我们主要感兴趣的是发现两种酸中毒形式（代谢性酸中毒和呼吸性酸中毒）的易感性，这两种酸中毒都可能对儿童健康产生严重影响。我们决定从一组完整（没有缺失属性值）但另一方面又不够大以避免过拟合影响的训练对象中构建不同的决策树。评估决策树的常用方法是使用测试集。在我们的案例中，我们决定不使用测试集，而是请医学专家仔细查看生成的树。他们逐分支地检查和评估决策树。他们对生成的树的评论可在本文中找到。评论表明，从可用训练集生成的树主要有出奇好的分支，但另一方面有些分支非常“愚蠢”，找不到医学解释。此后我们可以得出结论，决策树概念和自动学习可以在受现实世界限制的现实世界情况中成功使用，但仅应在适当医学专家的指导下使用。