Pestian John, Nasrallah Henry, Matykiewicz Pawel, Bennett Aurora, Leenaars Antoon
Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA.
Biomed Inform Insights. 2010 Aug 4;2010(3):19-28. doi: 10.4137/bii.s4706.
Suicide is the second leading cause of death among 25-34 year olds and the third leading cause of death among 15-25 year olds in the United States. In the Emergency Department, where suicidal patients often present, estimating the risk of repeated attempts is generally left to clinical judgment. This paper presents our second attempt to determine the role of computational algorithms in understanding a suicidal patient's thoughts, as represented by suicide notes. We focus on developing methods of natural language processing that distinguish between genuine and elicited suicide notes. We hypothesize that machine learning algorithms can categorize suicide notes as well as mental health professionals and psychiatric physician trainees do. The data used are comprised of suicide notes from 33 suicide completers and matched to 33 elicited notes from healthy control group members. Eleven mental health professionals and 31 psychiatric trainees were asked to decide if a note was genuine or elicited. Their decisions were compared to nine different machine-learning algorithms. The results indicate that trainees accurately classified notes 49% of the time, mental health professionals accurately classified notes 63% of the time, and the best machine learning algorithm accurately classified the notes 78% of the time. This is an important step in developing an evidence-based predictor of repeated suicide attempts because it shows that natural language processing can aid in distinguishing between classes of suicidal notes.
在美国,自杀是25至34岁人群中的第二大死因,是15至25岁人群中的第三大死因。在急诊室,经常会有自杀患者前来就诊,而对其再次自杀风险的评估通常依靠临床判断。本文是我们第二次尝试确定计算算法在理解自杀患者想法(以自杀遗书为代表)方面的作用。我们专注于开发自然语言处理方法,以区分真实的和诱导产生的自杀遗书。我们假设机器学习算法在对自杀遗书进行分类方面能够与心理健康专业人员和精神科医师实习生做得一样好。所使用的数据包括33例自杀身亡者的自杀遗书,并与健康对照组成员的33份诱导遗书相匹配。11名心理健康专业人员和31名精神科实习生被要求判断一份遗书是真实的还是诱导产生的。他们的判断结果与9种不同的机器学习算法进行了比较。结果表明,实习生准确分类遗书的概率为49%,心理健康专业人员为63%,而最佳机器学习算法准确分类遗书的概率为78%。这是朝着开发基于证据的再次自杀风险预测指标迈出的重要一步,因为它表明自然语言处理有助于区分不同类别的自杀遗书。