Acuña Caicedo Roberto Wellington, Gómez Soriano José Manuel, Melgar Sasieta Héctor Andrés
Carrera de Tecnología de la Información, Universidad Estatal del Sur de Manabí, Ecuador.
Departamento de Ingeniería, Sección de Ingeniería Informática, Escuela de Posgrado, Pontificia Universidad Católica del Perú, Lima, Peru.
Heliyon. 2020 Aug 3;6(8):e04412. doi: 10.1016/j.heliyon.2020.e04412. eCollection 2020 Aug.
According to the World Health Organization (WHO) close to 800,000 people worldwide die by suicide each year, and many more attempts to do it. In consequence, the WHO recognizes suicide as a global public health priority, which affects not only rich countries but poor and middle-income countries as well. This study makes a systematic analysis of 28 supervised classifiers using different features of the corpus Life to detect messages with suicidal ideation and depression to know if these can be used in an automatic prevention online system. The Life Corpus, used in this research, is a bilingual text corpus (English and Spanish) oriented to the detection of suicide ideation. This corpus was constructed retrieving texts from several social networks and its quality was measured using mutual annotation agreement. The different experiments determined that the classifier with the best performance was KStar, with the corpus features POS-SYNSETS-NUM, achieving the best results with the ROC Area metrics of 0,81036 and F-measure of 0,7148. The present research fulfilled the objective of discovering which supervised classifiers and which features are the most suitable for the automatic classification of messages with suicidal ideation using the Life Corpus. Also, given the imbalance of the results, a new precision measure was developed called the Two-dimensional Accuracy and Recovery Index (GDP), which can provide better results, in unbalanced systems, than the usual measures to assess the quality of the results (measure F, Area ROC), and thus increase the number of messages at risk of suicidal ideation, detected at the cost of receiving more messages that are not related to suicide or vice versa.
据世界卫生组织(WHO)统计,全球每年有近80万人死于自杀,还有更多人尝试自杀。因此,WHO将自杀视为全球公共卫生重点问题,这不仅影响富国,也影响穷国和中等收入国家。本研究对28种监督分类器进行了系统分析,利用语料库Life的不同特征来检测含有自杀意念和抑郁情绪的信息,以了解这些信息是否可用于在线自动预防系统。本研究中使用的Life语料库是一个面向自杀意念检测的双语文本语料库(英语和西班牙语)。该语料库通过从多个社交网络检索文本构建而成,其质量通过相互注释一致性来衡量。不同的实验确定,性能最佳的分类器是KStar,其语料库特征为词性-同义词集-数字,在ROC面积指标为0.81036和F值为0.7148时取得了最佳结果。本研究实现了利用Life语料库发现哪些监督分类器和哪些特征最适合对含有自杀意念的信息进行自动分类的目标。此外,鉴于结果的不平衡,开发了一种新的精度度量方法,称为二维准确率和召回率指数(GDP),在不平衡系统中,该方法能比评估结果质量的常用方法(F值、ROC面积)提供更好的结果,从而以接收更多与自杀无关的信息为代价,增加检测到的有自杀意念风险的信息数量,反之亦然。