Zhang Junping, Wang Xiaodan, Kruger Uwe, Wang Fei-Yue
Shanghai Key Laboratory of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai 200433, China.
IEEE Trans Neural Netw. 2011 Mar;22(3):367-80. doi: 10.1109/TNN.2010.2100408. Epub 2010 Dec 30.
Most partitioning algorithms iteratively partition a space into cells that contain underlying linear or nonlinear structures using linear partitioning strategies. The compactness of each cell depends on how well the (locally) linear partitioning strategy approximates the intrinsic structure. To partition a compact structure for complex data in a nonlinear context, this paper proposes a nonlinear partition strategy. This is a principal curve tree (PC-tree), which is implemented iteratively. Given that a PC passes through the middle of the data distribution, it allows for partitioning based on the arc length of the PC. To enhance the partitioning of a given space, a residual version of the PC-tree algorithm is developed, denoted here as the principal component analysis tree (PCR-tree) algorithm. Because of its residual property, the PCR-tree can yield the intrinsic dimension of high-dimensional data. Comparisons presented in this paper confirm that the proposed PC-tree and PCR-tree approaches show a better performance than several other competing partitioning algorithms in terms of vector quantization error and nearest neighbor search. The comparison also shows that the proposed algorithms outperform competing linear methods in total average coverage which measures the nonlinear compactness of partitioning algorithms.
大多数划分算法使用线性划分策略将空间迭代地划分为包含潜在线性或非线性结构的单元。每个单元的紧凑性取决于(局部)线性划分策略对内在结构的近似程度。为了在非线性环境中对复杂数据的紧凑结构进行划分,本文提出了一种非线性划分策略。这是一种主曲线树(PC树),它是迭代实现的。鉴于主曲线穿过数据分布的中间,它允许基于主曲线的弧长进行划分。为了增强对给定空间的划分,开发了PC树算法的残差版本,在此表示为主成分分析树(PCR树)算法。由于其残差特性,PCR树可以得出高维数据的内在维度。本文给出的比较结果证实,所提出的PC树和PCR树方法在矢量量化误差和最近邻搜索方面比其他几种竞争划分算法表现更好。比较还表明,所提出的算法在衡量划分算法非线性紧凑性的总平均覆盖率方面优于竞争的线性方法。