Zhu H, Xia X, Yao J, Fan H, Wang Q, Gao Q
Department of Epidemiology and Health Statistics & Beijing Municipal Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, China.
Key Laboratory of Cardiovascular Epidemiology & Department of Epidemiology, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, China.
J Psychiatr Res. 2020 May;124:123-130. doi: 10.1016/j.jpsychires.2020.02.019. Epub 2020 Feb 22.
To compare the performance of methods based on text mining to screen suicidal behaviors according to chief complaint of the psychiatric inpatients.
Electronic Medical Records of inpatients with mental disorders were collected. Text mining method was adopted to screen suicidal behaviors. The performances of different combinations of six algorithms and two term weighting factors were compared under various training set sizes, which were assessed by precision, recall, F1-value and accuracy.
A total of 3600 psychiatric inpatients (1800 with suicidal behaviors and 1800 without suicidal behaviors) were included in this study. In chief complaints of suicidal inpatients, "suicide", "notion" and "suspicion" were the commonest statements, appearing 1228, 705 and 638 times respectively. In contrast, "excitement", "instability" and "impulsion" appeared more frequently in chief complaints of patients without suicidal behaviors (599, 599, 534 times respectively). The performance of each algorithm was generally improved with the increasing training set sizes and tended to be stable when the number of training cases reached 1000, where most of them could achieve satisfactory accuracy values (>0.95). Results of testing set showed that SVM, Random Forest and AdaBoost weighted by TF had better generalization ability. The F1 values were 0.9889 for SVM, 0.9838 for random forest and 0.9828 for AdaBoost, respectively.
This study confirmed the feasibility of filtering suicidal inpatients with small amounts of representative terms. SVM, Random Forest and AdaBoost weighted by TF have better performance in this task. Our findings provided a practical way to automatically classify patients with or without suicidal behaviors before admission to hospital, which potentially led to considerable savings in time and human resources for identification of high-risk patients and suicide prevention.
比较基于文本挖掘的方法根据精神科住院患者的主诉筛查自杀行为的性能。
收集精神障碍住院患者的电子病历。采用文本挖掘方法筛查自杀行为。在不同训练集规模下,比较六种算法和两个词加权因子的不同组合的性能,通过精确率、召回率、F1值和准确率进行评估。
本研究共纳入3600例精神科住院患者(1800例有自杀行为,1800例无自杀行为)。在有自杀行为的住院患者的主诉中,“自杀”“念头”和“怀疑”是最常见的表述,分别出现1228次、705次和638次。相比之下,“兴奋”“不稳定”和“冲动”在无自杀行为患者的主诉中出现得更频繁(分别为599次、599次、534次)。随着训练集规模的增加,各算法的性能总体上有所提高,当训练病例数达到1000时趋于稳定,此时大多数算法能达到令人满意的准确率值(>0.95)。测试集结果表明,由词频加权的支持向量机(SVM)、随机森林和自适应增强算法具有更好的泛化能力。SVM的F1值为0.9889,随机森林为0.9838,自适应增强算法为0.9828。
本研究证实了用少量代表性词汇筛选有自杀行为住院患者的可行性。由词频加权的支持向量机、随机森林和自适应增强算法在这项任务中表现更佳。我们的研究结果提供了一种在患者入院前自动分类有无自杀行为的实用方法,这可能会在识别高危患者和预防自杀方面节省大量时间和人力资源。