计数回归模型与机器学习技术的预测性能：使用汽车保险理赔频率数据集的比较分析

Predictive performance of count regression models versus machine learning techniques: A comparative analysis using an automobile insurance claims frequency dataset.

作者信息

Alomair Gadir

机构信息

Department of Quantitative Methods, School of Business, King Faisal University, Al-Ahsa, Saudi Arabia.

出版信息

PLoS One. 2024 Dec 31;19(12):e0314975. doi: 10.1371/journal.pone.0314975. eCollection 2024.

DOI:10.1371/journal.pone.0314975

PMID:39739961

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11687910/

Abstract

Accurate forecasting of claim frequency in automobile insurance is essential for insurers to assess risks effectively and establish appropriate pricing policies. Traditional methods typically rely on a Poisson distribution for modeling claim counts; however, this approach can be inadequate due to frequent zero-claim periods, leading to zero inflation in the data. Zero inflation occurs when more zeros are observed than expected under standard Poisson or negative binomial (NB) models. While machine learning (ML) techniques have been explored for predictive analytics in other contexts, their application to zero-inflated insurance data remains limited. This study investigates the utility of ML in improving forecast accuracy under conditions of zero-inflation, a data characteristic common in automobile insurance. The research involved a comparative evaluation of several models, including Poisson, NB, zero-inflated Poisson (ZIP), hurdle Poisson, zero-inflated negative binomial (ZINB), hurdle negative binomial, random forest (RF), support vector machine (SVM), and artificial neural network (ANN) on an insurance dataset. The performance of these models was assessed using mean absolute error. The results reveal that the SVM model outperforms others in predictive accuracy, particularly in handling zero-inflation, followed by the ZIP and ZINB models. In contrast, the traditional Poisson and NB models showed lower predictive capabilities. By addressing the challenge of zero-inflation in automobile claim data, this study offers insights into improving the accuracy of claim frequency predictions. Although this study is based on a single dataset, the findings provide valuable perspectives on enhancing prediction accuracy and improving risk management practices in the insurance industry.

摘要

准确预测汽车保险中的索赔频率对于保险公司有效评估风险和制定适当的定价政策至关重要。传统方法通常依赖泊松分布来对索赔次数进行建模；然而，由于频繁出现零索赔期，这种方法可能并不适用，从而导致数据中的零膨胀现象。当在标准泊松或负二项式（NB）模型下观察到的零值比预期更多时，就会出现零膨胀。虽然机器学习（ML）技术已在其他领域用于预测分析，但其在零膨胀保险数据中的应用仍然有限。本研究调查了ML在零膨胀条件下提高预测准确性的效用，零膨胀是汽车保险中常见的数据特征。该研究对包括泊松、NB、零膨胀泊松（ZIP）、障碍泊松、零膨胀负二项式（ZINB）、障碍负二项式、随机森林（RF）、支持向量机（SVM）和人工神经网络（ANN）在内的多个模型在一个保险数据集上进行了比较评估。使用平均绝对误差评估这些模型的性能。结果表明，SVM模型在预测准确性方面优于其他模型，尤其是在处理零膨胀方面，其次是ZIP和ZINB模型。相比之下，传统的泊松和NB模型显示出较低的预测能力。通过应对汽车索赔数据中的零膨胀挑战，本研究为提高索赔频率预测的准确性提供了见解。尽管本研究基于单个数据集，但研究结果为提高预测准确性和改进保险业风险管理实践提供了有价值的观点。