Dvorchik I, Subotin M, Marsh W, McMichael J, Fung J J
Transplantation Institute, University of Pittsburgh, PA, USA.
Methods Inf Med. 1996 Mar;35(1):12-8.
A novel multisolutional clustering and quantization (MCQ) algorithm has been developed that provides a flexible way to preprocess data. It was tested whether it would impact the neural network's performance favorably and whether the employment of the proposed algorithm would enable neural networks to handle missing data. This was assessed by comparing the performance of neural networks using a well-documented data set to predict outcome following liver transplantation. This new approach to data preprocessing leads to a statistically significant improvement in network performance when compared to simple linear scaling. The obtained results also showed that coding missing data as zeroes in combination with the MCQ algorithm, leads to a significant improvement in neural network performance on a data set containing missing values in 59.4% of cases when compared to replacement of missing values with either series means or medians.
一种新颖的多解聚类与量化(MCQ)算法已被开发出来,它提供了一种灵活的数据预处理方式。测试了该算法是否会对神经网络的性能产生有利影响,以及采用该算法是否能使神经网络处理缺失数据。通过比较使用一个记录完备的数据集来预测肝移植后结果的神经网络的性能来进行评估。与简单的线性缩放相比,这种新的数据预处理方法在网络性能上带来了具有统计学意义的显著提升。所得结果还表明,将缺失数据编码为零并结合MCQ算法,与用序列均值或中位数替换缺失值相比,在一个59.4%的案例中包含缺失值的数据集上,能显著提高神经网络的性能。