Li Qi, Juang Biing-Hwang
Bell Labs, Lucent Technologies, USA.
IEEE Trans Neural Netw. 2006 Sep;17(5):1212-21. doi: 10.1109/TNN.2006.875992.
Discriminative training refers to an approach to pattern recognition based on direct minimization of a cost function commensurate with the performance of the recognition system. This is in contrast to the procedure of probability distribution estimation as conventionally required in Bayes' formulation of the statistical pattern recognition problem. Currently, most discriminative training algorithms for nonlinear classifier designs are based on gradient-descent (GD) methods for cost minimization. These algorithms are easy to derive and effective in practice, but are slow in training speed and have difficulty selecting the learning rates. To address the problem, we present our study on a fast discriminative training algorithm. The algorithm initializes the parameters by the expectation-maximization (EM) algorithm, and then uses a set of closed-form formulas derived in this paper to further optimize a proposed objective of minimizing error rate. Experiments in speech applications show that the algorithm provides better recognition accuracy in a fewer iterations than the EM algorithm and a neural network trained by hundreds of GD iterations. Although some convergent properties need further research, the proposed objective and derived formulas can benefit further study of the problem.
判别式训练是指一种基于直接最小化与识别系统性能相称的代价函数的模式识别方法。这与贝叶斯统计模式识别问题公式中传统要求的概率分布估计过程形成对比。目前,大多数用于非线性分类器设计的判别式训练算法基于梯度下降(GD)方法来最小化代价。这些算法易于推导且在实践中有效,但训练速度较慢且难以选择学习率。为了解决这个问题,我们展示了对一种快速判别式训练算法的研究。该算法通过期望最大化(EM)算法初始化参数,然后使用本文推导的一组闭式公式进一步优化一个提出的最小化错误率的目标。语音应用中的实验表明,与EM算法和经过数百次GD迭代训练的神经网络相比,该算法在更少的迭代次数下能提供更好的识别准确率。尽管一些收敛特性需要进一步研究,但提出的目标和推导的公式有助于对该问题的进一步研究。