Department of Environmental Health Sciences, Columbia University, New York, New York, United States of America.
Department of Epidemiology, Columbia University, New York, New York, United States of America.
PLoS Comput Biol. 2023 Mar 13;19(3):e1010945. doi: 10.1371/journal.pcbi.1010945. eCollection 2023 Mar.
Deaths by suicide, as well as suicidal ideations, plans and attempts, have been increasing in the US for the past two decades. Deployment of effective interventions would require timely, geographically well-resolved estimates of suicide activity. In this study, we evaluated the feasibility of a two-step process for predicting suicide mortality: a) generation of hindcasts, mortality estimates for past months for which observational data would not have been available if forecasts were generated in real-time; and b) generation of forecasts with observational data augmented with hindcasts. Calls to crisis hotline services and online queries to the Google search engine for suicide-related terms were used as proxy data sources to generate hindcasts. The primary hindcast model (auto) is an Autoregressive Integrated Moving average model (ARIMA), trained on suicide mortality rates alone. Three regression models augment hindcast estimates from auto with call rates (calls), GHT search rates (ght) and both datasets together (calls_ght). The 4 forecast models used are ARIMA models trained with corresponding hindcast estimates. All models were evaluated against a baseline random walk with drift model. Rolling monthly 6-month ahead forecasts for all 50 states between 2012 and 2020 were generated. Quantile score (QS) was used to assess the quality of the forecast distributions. Median QS for auto was better than baseline (0.114 vs. 0.21. Median QS of augmented models were lower than auto, but not significantly different from each other (Wilcoxon signed-rank test, p > .05). Forecasts from augmented models were also better calibrated. Together, these results provide evidence that proxy data can address delays in release of suicide mortality data and improve forecast quality. An operational forecast system of state-level suicide risk may be feasible with sustained engagement between modelers and public health departments to appraise data sources and methods as well as to continuously evaluate forecast accuracy.
在过去的二十年中,美国的自杀死亡人数以及自杀意念、计划和尝试都在增加。要部署有效的干预措施,就需要及时、准确地了解自杀活动的地理分布情况。在这项研究中,我们评估了两步法预测自杀死亡率的可行性:a)生成回溯预测,即对过去几个月的死亡率进行预测,如果实时生成预测,这些数据将无法获得;b)利用回溯预测数据和观测数据生成预测。我们将危机热线服务的来电和谷歌搜索引擎上与自杀相关的查询作为代理数据源,用于生成回溯预测。主要的回溯预测模型(auto)是一个自回归综合移动平均模型(ARIMA),仅基于自杀死亡率进行训练。三个回归模型将 auto 的回溯预测数据与来电率(calls)、GHT 搜索率(ght)以及两个数据集合并(calls_ght)进行了扩充。这 4 个预测模型都是用相应的回溯预测数据进行训练的 ARIMA 模型。我们将所有模型与随机游走带漂移模型的基线进行了对比。对 2012 年至 2020 年期间全美 50 个州逐月进行 6 个月的前瞻性预测。我们使用分位数得分(QS)来评估预测分布的质量。与基线相比,auto 的中位数 QS 更好(0.114 比 0.21)。扩充模型的中位数 QS 均低于 auto,但彼此之间没有显著差异(Wilcoxon 符号秩检验,p>.05)。扩充模型的预测也更具校准性。这些结果共同证明了代理数据可以解决自杀死亡率数据发布延迟的问题,并提高预测质量。通过建模人员和公共卫生部门之间的持续合作,评估数据源和方法,并不断评估预测准确性,构建州级自杀风险的实时预测系统是可行的。