Murphy C K
Penn State York, Pennsylvania 17403, USA.
Med Decis Making. 2001 Sep-Oct;21(5):368-75. doi: 10.1177/0272989X0102100503.
The purpose of this article is to compare the diagnostic accuracy of induced decision trees with that of pruned neural networks and to improve the accuracy and interpretation of breast cancer diagnosis from readings of thin-needle aspirate by identifying cases likely to be misclassified by induced decision rules.
Using an online database consisting of 699 cases of suspected breast cancer and their corresponding readings of fine-needle aspirate, decision trees were induced from half of the cases, randomly selected. Accuracy was determined for the remaining cases in successive partitions. The pattern of errors in the multiple decision trees was examined. A smaller data set was created with 2 classes: (1) correctly classified and (2) misclassified by a decision tree, rather than the original benign and malignant classes. From this data set, decision trees that describe the misclassified cases were induced.
Larger, less severely pruned decision trees were more accurate in breast cancer diagnosis for both training and test data. The accuracy of the induced decision trees exceeded that reported for the smaller pruned neural networks. Combining classifications from 2 trees was effective in identifying malignancies missed by a single tree. Induced decision trees were able to identify patterns associated with misclassified cases, but the identification of errors inductively did not improve the overall error rate.
In this application, a model that is too compact identifies fewer cases of the minority class, malignancy. New methods that combine models and examine classification errors can improve diagnosis by identifying more malignancies and by describing ambiguous cases.
本文旨在比较诱导决策树与剪枝神经网络的诊断准确性,并通过识别可能被诱导决策规则误分类的病例,提高细针穿刺抽吸活检结果对乳腺癌诊断的准确性及可解释性。
利用一个包含699例疑似乳腺癌病例及其相应细针穿刺抽吸活检结果的在线数据库,从随机选取的一半病例中诱导生成决策树。对剩余病例在连续划分中确定准确性。检查多个决策树中的错误模式。创建一个较小的数据集,包含两类:(1)正确分类的病例;(2)被决策树误分类的病例,而非原始的良性和恶性类别。从该数据集中诱导生成描述误分类病例的决策树。
对于训练数据和测试数据而言,更大且剪枝程度较轻的决策树在乳腺癌诊断中更准确。诱导决策树的准确性超过了报道的较小剪枝神经网络的准确性。结合两棵树的分类结果能有效识别单棵树遗漏的恶性肿瘤病例。诱导决策树能够识别与误分类病例相关的模式,但通过归纳法识别错误并未降低总体错误率。
在本应用中,过于紧凑的模型识别出的少数类别(恶性肿瘤)病例较少。结合模型并检查分类错误的新方法可通过识别更多恶性肿瘤病例和描述模糊病例来改善诊断。