Choi Hyunyul, Jung Kihyo
School of Industrial and Management Engineering, Korea University, Seoul 02841, Republic of Korea.
School of Industrial Engineering, University of Ulsan, Ulsan 44610, Republic of Korea.
Entropy (Basel). 2025 Jul 18;27(7):761. doi: 10.3390/e27070761.
This study investigates the impact of data distribution and bootstrap resampling on the anomaly detection performance of the Isolation Forest (iForest) algorithm in statistical process control. Although iForest has received attention for its multivariate and ensemble-based nature, its performance under non-normal data distributions and varying bootstrap settings remains underexplored. To address this gap, a comprehensive simulation was performed across 18 scenarios involving log-normal, gamma, and -distributions with different mean shift levels and bootstrap configurations. The results show that iForest substantially outperforms the conventional Hotelling's T control chart, especially in non-Gaussian settings and under small-to-medium process shifts. Enabling bootstrap resampling led to marginal improvements across classification metrics, including accuracy, precision, recall, F1-score, and average run length (ARL). However, a key limitation of iForest was its reduced sensitivity to subtle process changes, such as a 1σ mean shift, highlighting an area for future enhancement.
本研究调查了数据分布和自助重采样对统计过程控制中孤立森林(iForest)算法异常检测性能的影响。尽管iForest因其多变量和基于集成的特性而受到关注,但其在非正态数据分布和不同自助设置下的性能仍未得到充分探索。为了填补这一空白,我们在18种场景下进行了全面模拟,这些场景涉及具有不同均值偏移水平和自助配置的对数正态分布、伽马分布和t分布。结果表明,iForest显著优于传统的霍特林T控制图,特别是在非高斯设置和中小过程偏移情况下。启用自助重采样在包括准确率、精确率、召回率、F1分数和平均运行长度(ARL)在内的分类指标上带来了边际改进。然而,iForest的一个关键限制是其对细微过程变化(如1σ均值偏移)的敏感性降低,这突出了未来需要改进的一个领域。