Cheng Xiuyuan, Landa Boris
Department of Mathematics, Duke University, Durham, NC 27708, US.
Department of Electrical and Computer Engineering, Yale University, New Haven, CT 06520, US.
Inf inference. 2024 Sep 20;13(4):iaae026. doi: 10.1093/imaiai/iaae026. eCollection 2024 Dec.
Bi-stochastic normalization provides an alternative normalization of graph Laplacians in graph-based data analysis and can be computed efficiently by Sinkhorn-Knopp (SK) iterations. This paper proves the convergence of bi-stochastically normalized graph Laplacian to manifold (weighted-)Laplacian with rates, when [Formula: see text] data points are i.i.d. sampled from a general [Formula: see text]-dimensional manifold embedded in a possibly high-dimensional space. Under certain joint limit of [Formula: see text] and kernel bandwidth [Formula: see text], the point-wise convergence rate of the graph Laplacian operator (under 2-norm) is proved to be [Formula: see text] at finite large [Formula: see text] up to log factors, achieved at the scaling of [Formula: see text]. When the manifold data are corrupted by outlier noise, we theoretically prove the graph Laplacian point-wise consistency which matches the rate for clean manifold data plus an additional term proportional to the boundedness of the inner-products of the noise vectors among themselves and with data vectors. Motivated by our analysis, which suggests that not exact bi-stochastic normalization but an approximate one will achieve the same consistency rate, we propose an approximate and constrained matrix scaling problem that can be solved by SK iterations with early termination. Numerical experiments support our theoretical results and show the robustness of bi-stochastically normalized graph Laplacian to high-dimensional outlier noise.
双随机归一化在基于图的数据分析中为图拉普拉斯算子提供了一种替代的归一化方法,并且可以通过Sinkhorn-Knopp(SK)迭代有效地进行计算。本文证明了在从嵌入到可能高维空间中的一般(d)维流形中独立同分布采样(n)个数据点时,双随机归一化图拉普拉斯算子以一定速率收敛到流形(加权)拉普拉斯算子。在(n)和核带宽(h)的某些联合极限下,证明了图拉普拉斯算子(在2-范数下)在有限大的(n)时,直至对数因子,逐点收敛速率为(O(\frac{h^2}{n})),在(h = O(\sqrt{\frac{1}{n}}))的尺度下实现。当流形数据被离群值噪声破坏时,我们从理论上证明了图拉普拉斯算子的逐点一致性,它与干净流形数据的速率相匹配,再加上一个与噪声向量之间以及与数据向量的内积的有界性成比例的附加项。基于我们的分析,即不是精确的双随机归一化而是近似的双随机归一化将实现相同的一致性速率,我们提出了一个近似且受约束的矩阵缩放问题,该问题可以通过带有早期终止的SK迭代来解决。数值实验支持了我们的理论结果,并展示了双随机归一化图拉普拉斯算子对高维离群值噪声的鲁棒性。