School of Civil & Environment Engineering, Queensland University of Technology, 2 George Street, Brisbane, 4000 QLD, Australia.
School of Mathematical Sciences, Queensland University of Technology, 2 George Street, Brisbane, 4000 QLD, Australia.
Accid Anal Prev. 2024 Oct;206:107690. doi: 10.1016/j.aap.2024.107690. Epub 2024 Jul 4.
Analyzing crash data is a complex and labor-intensive process that requires careful consideration of multiple interdependent modeling aspects, such as functional forms, transformations, likely contributing factors, correlations, and unobserved heterogeneity. Limited time, knowledge, and experience may lead to over-simplified, over-fitted, or misspecified models overlooking important insights. This paper proposes an extensive hypothesis testing framework including a multi-objective mathematical programming formulation and solution algorithms to estimate crash frequency models considering simultaneously likely contributing factors, transformations, non-linearities, and correlated random parameters. The mathematical programming formulation minimizes both in-sample fit and out-of-sample prediction. To address the complexity and non-convexity of the mathematical program, the proposed solution framework utilizes a variety of metaheuristic solution algorithms. Specifically, Harmony Search demonstrated minimal sensitivity to hyperparameters, enabling an efficient search for solutions without being influenced by the choice of hyperparameters. The effectiveness of the framework was evaluated using two real-world datasets and one synthetic dataset. Comparative analyses were performed using the two real-world datasets and the corresponding models published in literature by independent teams. The proposed framework showed its capability to pinpoint efficient model specifications, produce accurate estimates, and provide valuable insights for both researchers and practitioners. The proposed approach allows for the discovery of numerous insights while minimizing the time spent on model development. By considering a broader set of contributing factors, models with varied qualities can be generated. For instance, when applied to crash data from Queensland, the proposed approach revealed that the inclusion of medians on sharp curved roads can effectively reduce the occurrence of crashes, when applied to crash data from Washington, the simultaneous consideration of traffic volume and road curvature resulted in a notable reduction in crash variances but an increase in crash means.
分析碰撞数据是一个复杂且耗费精力的过程,需要仔细考虑多个相互依存的建模方面,例如功能形式、变换、可能的影响因素、相关性和未观测到的异质性。时间、知识和经验有限可能导致模型过于简化、过度拟合或指定不当,从而忽略了重要的见解。本文提出了一个广泛的假设检验框架,包括一个多目标数学规划公式和求解算法,用于同时考虑可能的影响因素、变换、非线性和相关的随机参数来估计碰撞频率模型。数学规划公式最小化了样本内拟合和样本外预测。为了解决数学规划的复杂性和非凸性,所提出的解决方案框架利用了各种元启发式求解算法。具体来说,Harmony Search 对超参数的敏感性最小,能够在不受超参数选择影响的情况下有效地搜索解决方案。该框架的有效性使用两个真实数据集和一个合成数据集进行了评估。使用两个真实数据集进行了对比分析,并使用独立团队在文献中发表的相应模型进行了对比分析。所提出的框架展示了其确定有效模型规格、生成准确估计以及为研究人员和实践者提供有价值见解的能力。该方法允许在最小化模型开发时间的同时发现许多见解。通过考虑更广泛的影响因素,可以生成具有不同质量的模型。例如,当将该方法应用于昆士兰州的碰撞数据时,发现包括在急转弯道路上的中位数可以有效地减少碰撞的发生,当将该方法应用于华盛顿的碰撞数据时,同时考虑交通量和道路曲率可以显著降低碰撞方差,但会增加碰撞均值。