Pestov Vladimir
Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, ON, K1N 6N5, Canada.
Neural Netw. 2008 Mar-Apr;21(2-3):204-13. doi: 10.1016/j.neunet.2007.12.030. Epub 2007 Dec 27.
We perform a deeper analysis of an axiomatic approach to the concept of intrinsic dimension of a dataset proposed by us in the IJCNN'07 paper. The main features of our approach are that a high intrinsic dimension of a dataset reflects the presence of the curse of dimensionality (in a certain mathematically precise sense), and that dimension of a discrete i.i.d. sample of a low-dimensional manifold is, with high probability, close to that of the manifold. At the same time, the intrinsic dimension of a sample is easily corrupted by moderate high-dimensional noise (of the same amplitude as the size of the manifold) and suffers from prohibitively high computational complexity (computing it is an NP-complete problem). We outline a possible way to overcome these difficulties.
我们对我们在IJCNN'07论文中提出的数据集内在维度概念的公理化方法进行了更深入的分析。我们方法的主要特点是,数据集的高内在维度反映了维度诅咒的存在(在某种数学上精确的意义上),并且低维流形的离散独立同分布样本的维度很可能接近该流形的维度。同时,样本的内在维度很容易被适度的高维噪声(与流形大小相同幅度)破坏,并且存在高得令人望而却步的计算复杂度(计算它是一个NP完全问题)。我们概述了一种克服这些困难的可能方法。