Department of Natural and Applied Sciences, Bentley University, Waltham, Massachusetts, USA.
Health Thought Leadership Network, Bentley University, Waltham, Massachusetts, USA.
Suicide Life Threat Behav. 2022 Aug;52(4):782-791. doi: 10.1111/sltb.12862. Epub 2022 Apr 6.
To improve the accuracy of classification of deaths of undetermined intent and to examine racial differences in misclassification.
We used natural language processing and statistical text analysis on restricted-access case narratives of suicides, homicides, and undetermined deaths in 37 states collected from the National Violent Death Reporting System (NVDRS) (2017). We fit separate race-specific classification models to predict suicide among undetermined cases using data from known homicide cases (true negatives) and known suicide cases (true positives).
A classifier trained on an all-race dataset predicts less than half of these cases as suicide. Importantly, our analysis yields an estimated suicide rate for the Black population comparable with the typical detection rate for the White population, indicating that misclassification excess is endemic for Black suicide. This problem may be mitigated by using race-specific data. Our findings, based on the statistical text analysis, also reveal systematic differences in the phrases identified as most predictive of suicide.
This study highlights the need to understand the reasons underlying suicide rate differences and for further testing of strategies to reduce misclassification, particularly among people of color.
提高不确定意图死亡分类的准确性,并检查分类中的种族差异。
我们使用自然语言处理和统计文本分析,对来自国家暴力死亡报告系统(NVDRS)的 37 个州的自杀、他杀和死因不明的限定访问案例叙述进行了分析(2017 年)。我们使用来自已知他杀案例(真阴性)和已知自杀案例(真阳性)的数据,为每个种族分别拟合特定的分类模型,以预测死因不明案例中的自杀。
在全种族数据集上训练的分类器预测不到一半的这些案例为自杀。重要的是,我们的分析为黑人群体估计的自杀率与白人群体的典型检出率相当,表明黑人群体自杀的过度分类是普遍存在的。通过使用特定种族的数据,可以减轻这个问题。我们的研究结果基于统计文本分析,还揭示了在确定最能预测自杀的短语方面存在系统性差异。
这项研究强调了理解自杀率差异背后的原因以及进一步测试减少分类错误的策略的必要性,特别是在有色人种中。