Walker M G, Olshen R A
Section on Medical Informatics, Stanford University School of Medicine, CA 94305.
Proc Annu Symp Comput Appl Med Care. 1992:451-5.
Suppose that we wish to know the probability that an object belongs to a class. For example, we may wish to estimate the probability that a patient has a particular disease, given a set of symptoms, or we may wish to know the probability that a novel peptide binds to a receptor, given the peptide's amino-acid composition. The conventional approach is to first use a classification algorithm to find partitions in feature space and to assign each partition to a class, and then to estimate the conditional probabilities as the proportion of patients or peptides that are correctly and incorrectly classified in each partition. Unfortunately, this estimation method often gives probability estimates that are in error by 20% or more, and thus can cause incorrect decisions. We have implemented and compared alternative methods. In Monte Carlo simulations the alternative methods are substantially more accurate than is the current method.