National Center for Injury Prevention and Control, U.S. Centers for Disease Control and Prevention, Atlanta, GA, USA.
National Center for Injury Prevention and Control, U.S. Centers for Disease Control and Prevention, Atlanta, GA, USA.
J Affect Disord. 2023 Dec 1;342:63-68. doi: 10.1016/j.jad.2023.08.141. Epub 2023 Sep 11.
Suicide mortality data are a critical source of information for understanding suicide-related trends in the United States. However, official suicide mortality data experience significant delays. The Google Symptom Search Dataset (SSD), a novel population-level data source derived from online search behavior, has not been evaluated for its utility in predicting suicide mortality trends.
We identified five mental health related variables (suicidal ideation, self-harm, depression, major depressive disorder, and pain) from the SSD. Daily search trends for these symptoms were utilized to estimate national and state suicide counts in 2020, the most recent year for which data was available, via a linear regression model. We compared the performance of this model to a baseline autoregressive integrated moving average (ARIMA) model and a model including all 422 symptoms (All Symptoms) in the SSD.
Our Mental Health Model estimated the national number of suicide deaths with an error of -3.86 %, compared to an error of 7.17 % and 28.49 % for the ARIMA baseline and All Symptoms models. At the state level, 70 % (N = 35) of states had a prediction error of <10 % with the Mental Health Model, with accuracy generally favoring larger population states with higher number of suicide deaths.
The Google SSD is a new real-time data source that can be used to make accurate predictions of suicide mortality monthly trends at the national level. Additional research is needed to optimize state level predictions for states with low suicide counts.
自杀死亡率数据是了解美国自杀相关趋势的重要信息来源。然而,官方自杀死亡率数据存在明显的延迟。Google 症状搜索数据集(SSD)是一种从在线搜索行为中提取的新型人群水平数据来源,尚未对其在预测自杀死亡率趋势方面的实用性进行评估。
我们从 SSD 中确定了五个与心理健康相关的变量(自杀意念、自残、抑郁、重度抑郁症和疼痛)。利用这些症状的每日搜索趋势,通过线性回归模型,估算了 2020 年的全国和各州自杀人数,这是可获得数据的最近一年。我们将该模型的性能与基线自回归综合移动平均(ARIMA)模型和包含 SSD 中所有 422 个症状的模型(所有症状模型)进行了比较。
我们的心理健康模型估计全国自杀死亡人数的误差为-3.86%,而 ARIMA 基线模型和所有症状模型的误差分别为 7.17%和 28.49%。在州一级,70%(N=35)的州的心理健康模型预测误差<10%,准确性通常偏向于人口较多、自杀死亡人数较高的州。
Google SSD 是一种新的实时数据源,可用于每月准确预测全国自杀死亡率趋势。需要进一步研究,以优化自杀死亡人数较低的州的州级预测。