均值漂移聚类识别方法在嵌套采样算法中的实现

Mean Shift Cluster Recognition Method Implementation in the Nested Sampling Algorithm.

作者信息

Trassinelli Martino, Ciccodicola Pierre

机构信息

Institut des NanoSciences de Paris, CNRS, Sorbonne Université, 4 Place Jussieu, 75005 Paris, France.

出版信息

Entropy (Basel). 2020 Feb 6;22(2):185. doi: 10.3390/e22020185.

DOI:10.3390/e22020185

PMID:33285961

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7516612/

Abstract

Nested sampling is an efficient algorithm for the calculation of the Bayesian evidence and posterior parameter probability distributions. It is based on the step-by-step exploration of the parameter space by Monte Carlo sampling with a series of values sets called live points that evolve towards the region of interest, i.e., where the likelihood function is maximal. In presence of several local likelihood maxima, the algorithm converges with difficulty. Some systematic errors can also be introduced by unexplored parameter volume regions. In order to avoid this, different methods are proposed in the literature for an efficient search of new live points, even in presence of local maxima. Here we present a new solution based on the mean shift cluster recognition method implemented in a random walk search algorithm. The clustering recognition is integrated within the Bayesian analysis program NestedFit. It is tested with the analysis of some difficult cases. Compared to the analysis results without cluster recognition, the computation time is considerably reduced. At the same time, the entire parameter space is efficiently explored, which translates into a smaller uncertainty of the extracted value of the Bayesian evidence.

摘要

嵌套采样是一种用于计算贝叶斯证据和后验参数概率分布的高效算法。它基于通过蒙特卡罗采样对参数空间进行逐步探索，使用一系列称为活跃点的数值集，这些活跃点朝着感兴趣的区域演化，即似然函数最大的区域。在存在多个局部似然最大值的情况下，该算法收敛困难。未探索的参数体积区域也可能引入一些系统误差。为了避免这种情况，文献中提出了不同的方法来有效地搜索新的活跃点，即使在存在局部最大值的情况下也是如此。在这里，我们提出了一种基于均值漂移聚类识别方法的新解决方案，该方法在随机游走搜索算法中实现。聚类识别集成在贝叶斯分析程序NestedFit中。通过对一些困难案例的分析进行了测试。与没有聚类识别的分析结果相比，计算时间大幅减少。同时，整个参数空间得到了有效探索，这转化为贝叶斯证据提取值的不确定性更小。