Walker M G, Olshen R A
Section on Medical Informatics, Stanford University School of Medicine, CA 94305.
Proc Annu Symp Comput Appl Med Care. 1992:451-5.
Suppose that we wish to know the probability that an object belongs to a class. For example, we may wish to estimate the probability that a patient has a particular disease, given a set of symptoms, or we may wish to know the probability that a novel peptide binds to a receptor, given the peptide's amino-acid composition. The conventional approach is to first use a classification algorithm to find partitions in feature space and to assign each partition to a class, and then to estimate the conditional probabilities as the proportion of patients or peptides that are correctly and incorrectly classified in each partition. Unfortunately, this estimation method often gives probability estimates that are in error by 20% or more, and thus can cause incorrect decisions. We have implemented and compared alternative methods. In Monte Carlo simulations the alternative methods are substantially more accurate than is the current method.
假设我们想知道一个物体属于某一类别的概率。例如,给定一组症状,我们可能想估计患者患有某种特定疾病的概率,或者给定肽的氨基酸组成,我们可能想知道一种新型肽与受体结合的概率。传统方法是首先使用分类算法在特征空间中找到划分,并将每个划分分配给一个类别,然后将条件概率估计为每个划分中被正确和错误分类的患者或肽的比例。不幸的是,这种估计方法通常会给出误差达20%或更高的概率估计,从而可能导致错误的决策。我们已经实现并比较了替代方法。在蒙特卡罗模拟中,替代方法比当前方法要准确得多。