Jiang Yulei
Kurt Rossmann Laboratories for Radiologic Image Research, Department of Radiology, MC2026, The University of Chicago, 5841 South Maryland Ave., Chicago, IL 60637, USA.
IEEE Trans Med Imaging. 2003 Jul;22(7):913-21. doi: 10.1109/TMI.2003.815061.
Analysis of the performance of artificial neural networks (ANNs) is usually based on aggregate results on a population of cases. In this paper, we analyze ANN output corresponding to the individual case. We show variability in the outputs of multiple ANNs that are trained and "optimized" from a common set of training cases. We predict this variability from a theoretical standpoint on the basis that multiple ANNs can be optimized to achieve similar overall performance on a population of cases, but produce different outputs for the same individual case because the ANNs use different weights. We use simulations to show that the average standard deviation in the ANN output can be two orders of magnitude higher than the standard deviation in the ANN overall performance measured by the Az value. We further show this variability using an example in mammography where the ANNs are used to classify clustered microcalcifications as malignant or benign based on image features extracted from mammograms. This variability in the ANN output is generally not recognized because a trained individual ANN becomes a deterministic model. Recognition of this variability and the deterministic view of the ANN present a fundamental contradiction. The implication of this variability to the classification task warrants additional study.
人工神经网络(ANN)性能的分析通常基于一组病例的总体结果。在本文中,我们分析了与单个病例相对应的ANN输出。我们展示了多个从一组共同的训练病例中进行训练和“优化”的ANN输出的变异性。基于多个ANN可以被优化以在一组病例上实现相似的总体性能,但由于ANN使用不同的权重,对于同一单个病例会产生不同的输出这一理论观点,我们预测了这种变异性。我们通过模拟表明,ANN输出的平均标准差可能比由Az值衡量的ANN总体性能的标准差高两个数量级。我们使用乳腺X线摄影的一个例子进一步展示了这种变异性,在该例子中,ANN用于根据从乳腺X线照片中提取的图像特征将簇状微钙化分类为恶性或良性。ANN输出的这种变异性通常未被认识到,因为经过训练的单个ANN变成了一个确定性模型。认识到这种变异性以及对ANN的确定性观点存在根本矛盾。这种变异性对分类任务的影响值得进一步研究。