Sch. of Electr. Eng., Purdue Univ., West Lafayette, IN.
IEEE Trans Image Process. 1997;6(3):398-406. doi: 10.1109/83.557343.
This paper discusses a criterion for testing a vector quantizer (VQ) codebook that is obtained by "training". When a VQ codebook is designed by a clustering algorithm using a training set, "time-average" distortion, which is called the training-set-distortion (TSD), is usually calculated in each iteration of the algorithm, since the input probability function is unknown in general and cumbersome to deal with. The algorithm stops when the TSD ceases to significantly decrease. In order to test the resultant codebook, validating-set-distortion (VSD) is calculated on a separate validating set (VS). Codebooks that yield small difference between the TSD and the VSD are regarded as good ones. However, the difference VSD-TSD is not necessarily a desirable criterion for testing a trained codebook unless certain conditions are satisfied. A condition that is previously assumed to be important is that the VS has to be quite large to well approximate the source distribution. This condition implies greater computational burden of testing a codebook. In this paper, we first discuss the condition under which the difference VSD-TSD is a meaningful codebook testing criterion. Then, convergence properties of the VSD, a time-average quantity, are investigated. Finally we show that for large codebooks, a VS size as small as the size of the codebook is sufficient to evaluate the VSD. This paper consequently presents a simple method to test trained codebooks for VQ's. Experimental results on synthetic data and real images supporting the analysis are also provided and discussed.
本文讨论了一种用于测试通过“训练”获得的矢量量化(VQ)码本的准则。当使用训练集通过聚类算法设计 VQ 码本时,通常在算法的每次迭代中计算“时间平均”失真,即训练集失真(TSD),因为输入概率函数通常是未知的并且处理起来很麻烦。当 TSD 不再显著降低时,算法停止。为了测试所得码本,在单独的验证集(VS)上计算验证集失真(VSD)。在 TSD 和 VSD 之间产生较小差异的码本被认为是良好的。然而,除非满足某些条件,否则 VSD-TSD 差异不一定是测试训练码本的理想准则。以前假设的一个重要条件是 VS 必须足够大以很好地近似源分布。这一条件意味着测试码本的计算负担更大。在本文中,我们首先讨论了 VSD-TSD 差异是有意义的码本测试准则的条件。然后,研究了 VSD 的收敛特性,这是一个时间平均量。最后,我们表明对于大型码本,只需码本大小的 VS 大小就足以评估 VSD。本文随后提出了一种用于测试 VQ 的训练码本的简单方法。还提供并讨论了支持分析的合成数据和真实图像的实验结果。