Ha Phan Trinh, D'Silva Rhea, Chen Ethan, Koyutürk Mehmet, Karakurt Günnur
Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH, USA.
Department of Psychology, Case Western Reserve University, Cleveland, OH, USA.
J Comput Soc Sci. 2022 Nov;5(2):1207-1233. doi: 10.1007/s42001-022-00166-8. Epub 2022 May 7.
Intimate partner violence (IPV) is a significant public health problem that adversely affects the well-being of victims. IPV is often under-reported and non-physical forms of violence may not be recognized as IPV, even by victims. With the increasing popularity of social media and due to the anonymity provided by some of these platforms, people feel comfortable sharing descriptions of their relationship problems in social media. The content generated in these platforms can be useful in identifying IPV and characterizing the prevalence, causes, consequences, and correlates of IPV in broad populations. However, these descriptions are in the form of free text and no corpus of labeled data is available to perform large-scale computational and statistical analyses. Here, we use data from established questionnaires that are used to collect self-report data on IPV to train machine learning models to predict IPV from free text. Using Universal Sentence Encoder (USE) along with multiple machine learning algorithms (random forest, SVM, logistic regression, Naïve Bayes), we develop DetectIPV, a tool for detecting IPV in free text. Using DetectIPV, we comprehensively characterize the predictability of different types of violence (physical abuse, emotional abuse, sexual abuse) from free text. Our results show that a general model that is trained using examples of all violence types can identify IPV from free text with area under the ROC curve (AUROC) 89%. We also train type-specific models and observe that physical abuse can be identified with greatest accuracy (AUROC 98%), while sexual abuse can be identified with high precision but relatively low recall. While our results indicate that the prediction of emotional abuse is the most challenging, DetectIPV can identify emotional abuse with AUROC above 80%. These results establish DetectIPV as a tool that can be used to reliably detect IPV in the context of various applications, ranging from flagging social media posts to detecting IPV in large text corpuses for research purposes. DetectIPV is available as a web service at https://www.ipvlab.case.edu/ipvdetect/.
亲密伴侣暴力(IPV)是一个重大的公共卫生问题,对受害者的幸福产生不利影响。IPV往往报告不足,甚至受害者自己也可能不认为非身体形式的暴力属于IPV。随着社交媒体越来越普及,并且由于其中一些平台提供匿名性,人们在社交媒体上分享他们关系问题的描述时会感到自在。这些平台上生成的内容有助于识别IPV,并刻画广泛人群中IPV的患病率、成因、后果及相关因素。然而,这些描述是自由文本形式,没有可供进行大规模计算和统计分析的标记数据集。在此,我们使用来自用于收集IPV自我报告数据的既定问卷的数据,来训练机器学习模型,以便从自由文本中预测IPV。我们使用通用句子编码器(USE)以及多种机器学习算法(随机森林、支持向量机、逻辑回归、朴素贝叶斯),开发了DetectIPV,一种用于在自由文本中检测IPV的工具。使用DetectIPV,我们全面刻画了从自由文本中预测不同类型暴力(身体虐待、情感虐待、性虐待)的可预测性。我们的结果表明,表示所有暴力类型的示例训练的通用模型能够以曲线下面积(AUROC)89%从自由文本中识别IPV。我们还训练了特定类型的模型,并观察到身体虐待的识别准确率最高(AUROC 98%),而性虐待可以高精度识别,但召回率相对较低。虽然我们的结果表明情感虐待的预测最具挑战性,但DetectIPV能够以高于80%的AUROC识别情感虐待。这些结果确立了DetectIPV作为一种工具,可用于在各种应用场景中可靠地检测IPV,从标记社交媒体帖子到为研究目的在大型文本语料库中检测IPV。DetectIPV可作为网络服务在https://www.ipvlab.case.edu/ipvdetect/获取。