流形自适应维数估计再探讨。

Manifold-adaptive dimension estimation revisited.

作者信息

Benkő Zsigmond, Stippinger Marcell, Rehus Roberta, Bencze Attila, Fabó Dániel, Hajnal Boglárka, Eröss Loránd G, Telcs András, Somogyvári Zoltán

机构信息

Department of Computational Sciences, Wigner Research Centre for Physics, Budapest, Hungary.

János Szentágothai Doctoral School of Neurosciences, Semmelweis University, Budapest, Hungary.

出版信息

PeerJ Comput Sci. 2022 Jan 6;8:e790. doi: 10.7717/peerj-cs.790. eCollection 2022.

DOI:10.7717/peerj-cs.790

PMID:35111907

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8771813/

Abstract

Data dimensionality informs us about data complexity and sets limit on the structure of successful signal processing pipelines. In this work we revisit and improve the manifold adaptive Farahmand-Szepesvári-Audibert (FSA) dimension estimator, making it one of the best nearest neighbor-based dimension estimators available. We compute the probability density function of local FSA estimates, if the local manifold density is uniform. Based on the probability density function, we propose to use the median of local estimates as a basic global measure of intrinsic dimensionality, and we demonstrate the advantages of this asymptotically unbiased estimator over the previously proposed statistics: the mode and the mean. Additionally, from the probability density function, we derive the maximum likelihood formula for global intrinsic dimensionality, if i.i.d. holds. We tackle edge and finite-sample effects with an exponential correction formula, calibrated on hypercube datasets. We compare the performance of the corrected median-FSA estimator with kNN estimators: maximum likelihood (Levina-Bickel), the 2NN and two implementations of DANCo (R and MATLAB). We show that corrected median-FSA estimator beats the maximum likelihood estimator and it is on equal footing with DANCo for standard synthetic benchmarks according to mean percentage error and error rate metrics. With the median-FSA algorithm, we reveal diverse changes in the neural dynamics while resting state and during epileptic seizures. We identify brain areas with lower-dimensional dynamics that are possible causal sources and candidates for being seizure onset zones.

摘要

数据维度能让我们了解数据的复杂性，并对成功的信号处理管道结构设置限制。在这项工作中，我们重新审视并改进了流形自适应法拉曼德 - 塞佩斯瓦里 - 奥迪伯特（FSA）维度估计器，使其成为现有的基于最近邻的最佳维度估计器之一。如果局部流形密度均匀，我们计算局部FSA估计的概率密度函数。基于该概率密度函数，我们建议使用局部估计的中位数作为内在维度的基本全局度量，并证明这种渐近无偏估计器相对于先前提出的统计量（众数和均值）的优势。此外，如果独立同分布成立，我们从概率密度函数中推导出全局内在维度的最大似然公式。我们使用在超立方体数据集上校准的指数校正公式来处理边缘和有限样本效应。我们将校正后的中位数 - FSA估计器与kNN估计器的性能进行比较：最大似然估计器（列维纳 - 比克尔）、2NN以及DANCo的两种实现（R和MATLAB）。我们表明，校正后的中位数 - FSA估计器优于最大似然估计器，并且根据平均百分比误差和错误率指标，在标准合成基准测试中与DANCo相当。通过中位数 - FSA算法，我们揭示了静息状态和癫痫发作期间神经动力学的各种变化。我们识别出具有低维动力学的脑区，这些脑区可能是癫痫发作的因果源头和发作起始区的候选区域。