Ito Yusei, Takeichi Yasuo, Hino Hideitsu, Ono Kanta
Department of Applied Physics, Osaka University, 2-1 Yamadaoka, Suita, 565-0871, Osaka, Japan.
The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo, 190- 8562, Japan.
Sci Rep. 2024 Sep 29;14(1):22549. doi: 10.1038/s41598-024-74016-0.
We have successfully proposed and demonstrated a clustering method that overcomes the "needle-in-a-haystack problem" (finding minuscule important regions from massive spectral image data sets). The needle-in-a-haystack problem is of central importance in the characterization of materials since in bulk materials, the properties of a very tiny region often dominate the entire function. To solve this problem, we propose that rational partitioning of the spectral feature space in which spectra are distributed, or defining of the decision boundaries for clustering, can be performed by focusing on the discrimination limit defined by the measurement noise and partitioning the space at intervals of this limit. We verified the proposed method, applied it to actual measurement data, and succeeded in detecting tiny (~ 0.5%) important regions that were difficult for human researchers and other machine learning methods to detect in discovering unknown phases. The ability to detect these crucial regions helps in understanding materials and designing more functional materials.
我们成功地提出并演示了一种聚类方法,该方法克服了“大海捞针问题”(即从海量光谱图像数据集中找到极小的重要区域)。大海捞针问题在材料表征中至关重要,因为在块状材料中,一个非常小的区域的特性往往主导着整体功能。为了解决这个问题,我们提出,可以通过关注由测量噪声定义的辨别极限,并以该极限为间隔对光谱分布所在的光谱特征空间进行合理划分,或者定义聚类的决策边界。我们验证了所提出的方法,并将其应用于实际测量数据,成功检测到了微小的(约0.5%)重要区域,这些区域对于人类研究人员和其他机器学习方法来说,在发现未知相时很难检测到。检测这些关键区域的能力有助于理解材料并设计出功能更强的材料。