Suppr超能文献

一种用于材料特性多模态数据集统计分析的新箱大小指数方法。

A new bin size index method for statistical analysis of multimodal datasets from materials characterization.

机构信息

Department of Civil and Environmental Engineering, University of Massachusetts Amherst, Amherst, MA, 01003, USA.

Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, WI, 53715, USA.

出版信息

Sci Rep. 2023 Jul 5;13(1):10915. doi: 10.1038/s41598-023-37969-2.

Abstract

This paper presents a normalized standard error-based statistical data binning method, termed "bin size index" (BSI), which yields an optimized, objective bin size for constructing a rational histogram to facilitate subsequent deconvolution of multimodal datasets from materials characterization and hence the determination of the underlying probability density functions. Totally ten datasets, including four normally-distributed synthetic ones, three normally-distributed ones on the elasticity of rocks obtained by statistical nanoindentation, and three lognormally-distributed ones on the particle size distributions of flocculated clay suspensions, were used to illustrate the BSI's concepts and algorithms. While results from the synthetic datasets prove the method's accuracy and effectiveness, analyses of other real datasets from materials characterization and measurement further demonstrate its rationale, performance, and applicability to practical problems. The BSI method also enables determination of the number of modes via the comparative evaluation of the errors returned from different trial bin sizes. The accuracy and performance of the BSI method are further compared with other widely used binning methods, and the former yields the highest BSI and smallest normalized standard errors. This new method particularly penalizes the overfitting that tends to yield too many pseudo-modes via normalizing the errors by the number of modes hidden in the datasets, and also eliminates the difficulty in specifying criteria for acceptable values of the fitting errors. The advantages and disadvantages of the new method are also discussed.

摘要

本文提出了一种基于标准化标准误差的统计数据分箱方法,称为“分箱索引”(BSI),它为构建合理的直方图提供了优化的、客观的分箱大小,从而方便后续对材料特性的多峰数据集进行反卷积,并确定潜在的概率密度函数。总共使用了十个数据集,包括四个正态分布的合成数据集、三个通过统计纳米压痕获得的岩石弹性正态分布数据集,以及三个絮凝粘土悬浮液粒度分布的对数正态分布数据集,用于说明 BSI 的概念和算法。虽然来自合成数据集的结果证明了该方法的准确性和有效性,但对来自材料特性和测量的其他真实数据集的分析进一步证明了其合理性、性能以及对实际问题的适用性。BSI 方法还可以通过比较不同试验分箱大小返回的误差来确定模式数量。BSI 方法的准确性和性能还与其他广泛使用的分箱方法进行了比较,前者产生了最高的 BSI 和最小的标准化标准误差。这种新方法特别惩罚了通过将误差标准化为数据集隐藏的模式数量而倾向于产生过多伪模式的过度拟合,并且消除了指定拟合误差可接受值标准的困难。还讨论了新方法的优缺点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0feb/10322845/6d4253c72385/41598_2023_37969_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验