用于潜在自杀信息的自训练半监督标注方法。

Bootstrapping semi-supervised annotation method for potential suicidal messages.

作者信息

Acuña Caicedo Roberto Wellington, Gómez Soriano José Manuel, Melgar Sasieta Héctor Andrés

机构信息

Information Technology Undergraduate Program, Universidad Estatal del Sur de Manabí, Jipijapa, Manabí, SENESCYT Scholarship Holder, Ecuador.

Department of Engineering, Computer Engineering Section, Graduate School, Pontificia Universidad Católica del Perú, Lima, Peru.

出版信息

Internet Interv. 2022 Feb 28;28:100519. doi: 10.1016/j.invent.2022.100519. eCollection 2022 Apr.

DOI:10.1016/j.invent.2022.100519

PMID:35281704

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8913319/

Abstract

The suicide of a person is a tragedy that deeply affects families, communities, and countries. According to the standardized rate of suicides per number of inhabitants worldwide, in 2022 there will be approximately about 903,450 suicides and 18,069,000 unconsummated suicides, affecting people of all ages, countries, races, beliefs, social status, economic status, sex, etc. The publication of suicidal intentions by users of social networks has led to the initiation of research processes in this field, to detect them and encourage them not to commit suicide. This study focused on determining a semi-supervised method to populate the Life Corpus, using a bootstrapping technique, to automatically detect and classify texts extracted from social networks and forums related to suicide and depression based on initial supervised samples. To carry out the experiments we used two different classifiers: Support Vector Machine (SVM) (with Bag of Words (BoW) features with and without Term-Frequency/Inverse Document Frequency (Tf/Idf), as a weighted term, and with or without stopwords) and Rasa (with the default feature extraction system). In addition, we performed the experiments using five data collections: Life, Reddit, Life+Reddit, Life_en, and Life_en + Reddit. Using the semi-supervised method, we managed to increase the size of the Life Corpus from 102 to 273 samples with texts from the social network Reddit, in a combination Life+Reddit+BoW_Embeddings, with the SVM classifier, with which a macro f1 value of 0.80 was achieved. These texts were in turn evaluated by annotators manually with a Cohen's Kappa level of agreement of 0.86.

摘要

一个人的自杀是一场深刻影响家庭、社区和国家的悲剧。根据全球每居民人数的自杀标准化率，2022年将有大约903450起自杀事件和18069000起未遂自杀事件，影响所有年龄、国家、种族、信仰、社会地位、经济地位、性别等的人群。社交网络用户公布自杀意图引发了该领域的研究进程，以检测这些意图并鼓励他们不要自杀。本研究专注于确定一种半监督方法来填充生命语料库，使用自训练技术，基于初始监督样本自动检测和分类从社交网络和论坛中提取的与自杀和抑郁相关的文本。为了进行实验，我们使用了两种不同的分类器：支持向量机（SVM）（具有词袋（BoW）特征，有或没有词频/逆文档频率（Tf/Idf）作为加权项，有或没有停用词）和Rasa（使用默认特征提取系统）。此外，我们使用五个数据集进行了实验：Life、Reddit、Life+Reddit、Life_en和Life_en + Reddit。使用半监督方法，我们成功地将生命语料库的规模从102个样本增加到273个样本，这些样本来自社交网络Reddit，采用Life+Reddit+BoW_Embeddings组合，使用SVM分类器，实现了0.80的宏f1值。这些文本随后由注释者手动评估，科恩kappa一致性水平为0.86。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于潜在自杀信息的自训练半监督标注方法。

Bootstrapping semi-supervised annotation method for potential suicidal messages.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

用于潜在自杀信息的自训练半监督标注方法。

Bootstrapping semi-supervised annotation method for potential suicidal messages.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献