Suppr超能文献

高斯核的双重随机归一化对异方差噪声具有鲁棒性。

Doubly Stochastic Normalization of the Gaussian Kernel Is Robust to Heteroskedastic Noise.

作者信息

Landa Boris, Coifman Ronald R, Kluger Yuval

机构信息

Program in Applied Mathematics, Yale University.

Interdepartmental Program in Computational Biology and Bioinformatics, Yale University.

出版信息

SIAM J Math Data Sci. 2021;3(1):388-413. doi: 10.1137/20M1342124. Epub 2021 Mar 23.

Abstract

A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e., no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate , where is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide examples of simulated and experimental single-cell RNA sequence data with intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.

摘要

许多数据分析技术的一个基本步骤是构建一个描述数据点之间相似性的亲和矩阵。当数据点位于欧几里得空间时,一种广泛采用的方法是通过高斯核与成对距离来构建亲和矩阵,并随后进行某种归一化(例如行随机归一化或其对称变体)。我们证明,主对角线为零(即无自环)的高斯核的双随机归一化对异方差噪声具有鲁棒性。也就是说,双随机归一化的优势在于它能自动考虑具有不同噪声方差的观测值。具体而言,我们证明,在一个合适的高维环境中,当异方差噪声在空间的任何特定方向上都不会过度集中时,所得的(双随机)噪声亲和矩阵以速率 收敛到其无噪声对应矩阵,其中 是环境维度。我们通过数值演示了这一结果,并表明相比之下,流行的行随机归一化和对称归一化在异方差噪声下表现不佳。此外,我们提供了具有内在异方差性的模拟和实验单细胞RNA序列数据的示例,其中双随机归一化在探索性分析中的优势显而易见。

相似文献

2
Understanding Symmetric Smoothing Filters: A Gaussian Mixture Model Perspective.理解对称平滑滤波器:高斯混合模型视角。
IEEE Trans Image Process. 2017 Nov;26(11):5107-5121. doi: 10.1109/TIP.2017.2731208. Epub 2017 Jul 24.
3
Scalable Kernel Ordinal Regression via Doubly Stochastic Gradients.通过双重随机梯度实现可扩展内核序数回归
IEEE Trans Neural Netw Learn Syst. 2021 Aug;32(8):3677-3689. doi: 10.1109/TNNLS.2020.3015937. Epub 2021 Aug 3.
6
Diffusion maps for high-dimensional single-cell analysis of differentiation data.用于分化数据高维单细胞分析的扩散映射
Bioinformatics. 2015 Sep 15;31(18):2989-98. doi: 10.1093/bioinformatics/btv325. Epub 2015 May 21.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验