IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2227-2241. doi: 10.1109/TPAMI.2016.2635657. Epub 2016 Dec 5.
Many scientific datasets are of high dimension, and the analysis usually requires retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are mathematically formulated by curves, which is quite restrictive for real applications. A few methods can overcome the above problem, but they either require complicated human-made rules for a specific task with lack of adaption flexibility to different tasks, or cannot obtain explicit structures of data. To address these issues, we develop a novel principal graph and structure learning framework that captures the local information of the underlying graph structure based on reversed graph embedding. As showcases, models that can learn a spanning tree or a weighted undirected `1 graph are proposed, and a new learning algorithm is developed that learns a set of principal points and a graph structure from data, simultaneously. The new algorithm is simple with guaranteed convergence. We then extend the proposed framework to deal with large-scale data. Experimental results on various synthetic and six real world datasets show that the proposed method compares favorably with baselines and can uncover the underlying structure correctly.
许多科学数据集具有高维特性,分析通常需要保留数据的最重要结构。主曲线是一种广泛用于此目的的方法。然而,许多现有的方法仅适用于结构可以通过曲线进行数学公式化的数据,这对于实际应用来说限制很大。有一些方法可以克服上述问题,但它们要么需要针对特定任务的复杂人为规则,缺乏对不同任务的适应灵活性,要么无法获得数据的显式结构。为了解决这些问题,我们开发了一种新的主图和结构学习框架,该框架基于反向图嵌入来捕获底层图结构的局部信息。作为展示,提出了可以学习生成树或加权无向图的模型,并开发了一种新的学习算法,可以从数据中同时学习一组主点和图结构。新算法简单且保证收敛。然后,我们将提出的框架扩展到处理大规模数据。在各种合成和六个真实世界数据集上的实验结果表明,该方法与基线相比具有优势,并且可以正确揭示潜在结构。