Westover M Brandon, O'Sullivan Joseph A
Department of Neurology, Massachusetts General Hospital, Boston, MA 02114-2622 USA.
Department of Electrical engineering, Washington University, St. Louis, MO 63130 USA.
IEEE Trans Inf Theory. 2008 Jan;54(1):299-320. doi: 10.1109/tit.2007.911296. Epub 2008 Jan 4.
Biological and machine pattern recognition systems face a common challenge: Given sensory data about an unknown pattern, classify the pattern by searching for the best match within a library of representations stored in memory. In many cases, the number of patterns to be discriminated and the richness of the raw data force recognition systems to internally represent memory and sensory information in a compressed format. However, these representations must preserve enough information to accommodate the variability and complexity of the environment, otherwise recognition will be unreliable. Thus, there is an intrinsic tradeoff between the amount of resources devoted to data representation and the complexity of the environment in which a recognition system may reliably operate. In this paper, we describe a mathematical model for pattern recognition systems subject to resource constraints, and show how the aforementioned resource-complexity tradeoff can be characterized in terms of three rates related to the number of bits available for representing memory and sensory data, and the number of patterns populating a given statistical environment. We prove single-letter information-theoretic bounds governing the achievable rates, and investigate in detail two illustrative cases where the pattern data is either binary or Gaussian.
给定关于未知模式的感官数据,通过在存储于内存中的表示库中搜索最佳匹配来对模式进行分类。在许多情况下,要区分的模式数量和原始数据的丰富性迫使识别系统以压缩格式在内部表示内存和感官信息。然而,这些表示必须保留足够的信息以适应环境的可变性和复杂性,否则识别将不可靠。因此,在用于数据表示的资源量与识别系统可以可靠运行的环境复杂性之间存在内在的权衡。在本文中,我们描述了一个受资源约束的模式识别系统的数学模型,并展示了如何根据与用于表示内存和感官数据的比特数以及填充给定统计环境的模式数量相关的三个速率来表征上述资源 - 复杂性权衡。我们证明了控制可实现速率的单字母信息论界,并详细研究了模式数据为二进制或高斯的两个说明性案例。