Kuramata Hiroto, Yagi Hideki
Department of Computer and Network Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu 182-8585, Tokyo, Japan.
Entropy (Basel). 2022 Apr 30;24(5):635. doi: 10.3390/e24050635.
We consider a binary classification problem for a test sequence to determine from which source the sequence is generated. The system classifies the test sequence based on empirically observed (training) sequences obtained from unknown sources P1 and P2. We analyze the asymptotic fundamental limits of statistical classification for sources with multiple subclasses. We investigate the first- and second-order maximum error exponents under the constraint that the type-I error probability for all pairs of distributions decays exponentially fast and the type-II error probability is upper bounded by a small constant. In this paper, we first give a classifier which achieves the asymptotically maximum error exponent in the class of deterministic classifiers for sources with multiple subclasses, and then provide a characterization of the first-order error exponent. We next provide a characterization of the second-order error exponent in the case where only P2 has multiple subclasses but P1 does not. We generalize our results to classification in the case that P1 and P2 are a stationary and memoryless source and a mixed memoryless source with general mixture, respectively.
我们考虑一个针对测试序列的二元分类问题,以确定该序列是从哪个源生成的。系统基于从未知源P1和P2获得的经验观察(训练)序列对测试序列进行分类。我们分析了具有多个子类的源的统计分类的渐近基本极限。我们研究了在所有分布对的I型错误概率呈指数快速衰减且II型错误概率由一个小常数上界约束的情况下的一阶和二阶最大错误指数。在本文中,我们首先给出一个在具有多个子类的源的确定性分类器类别中实现渐近最大错误指数的分类器,然后给出一阶错误指数的特征描述。接下来,我们给出仅P2有多个子类而P1没有多个子类的情况下二阶错误指数的特征描述。我们将结果推广到P1和P2分别是平稳无记忆源和具有一般混合的混合无记忆源的分类情况。