Guo Yuting, Kim Sangmi, Warren Elise, Yang Yuan-Chi, Lakamana Sahithi, Sarker Abeed
Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States.
School of Nursing, Emory University, Atlanta, GA, United States.
AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:254-260. eCollection 2023.
Social media platforms are increasingly being used by intimate partner violence (IPV) victims to share experiences and seek support. If such information is automatically curated, it may be possible to conduct social media based surveillance and even design interventions over such platforms. In this paper, we describe the development of a supervised classification system that automatically characterizes IPV-related posts on the social network Reddit. We collected data from four IPV-related subreddits and manually annotated the data to indicate whether a post is a self-report of IPV or not. Using the annotated data (N=289), we trained, evaluated, and compared supervised machine learning systems. A transformer-based classifier, RoBERTa, obtained the best classification performance with overall accuracy of 78% and IPV-self-report class 𝐹 -score of 0.67. Post-classification error analyses revealed that misclassifications often occur for posts that are very long or are non-first-person reports of IPV. Despite the relatively small annotated data, our classification methods obtained promising results, indicating that it may be possible to detect and, hence, provide support to IPV victims over Reddit.
亲密伴侣暴力(IPV)受害者越来越多地使用社交媒体平台来分享经历并寻求支持。如果此类信息能够自动整理,那么就有可能在社交媒体上进行监测,甚至在这些平台上设计干预措施。在本文中,我们描述了一种监督分类系统的开发,该系统能自动对社交网络Reddit上与亲密伴侣暴力相关的帖子进行特征描述。我们从四个与亲密伴侣暴力相关的Reddit子版块收集了数据,并对数据进行人工标注,以表明一篇帖子是否为亲密伴侣暴力的自我报告。利用标注数据(N = 289),我们对监督机器学习系统进行了训练、评估和比较。基于变压器的分类器RoBERTa取得了最佳分类性能,总体准确率为78%,亲密伴侣暴力自我报告类的F值为0.67。分类后错误分析表明,对于非常长的帖子或非第一人称的亲密伴侣暴力报告,分类错误经常发生。尽管标注数据相对较少,但我们的分类方法取得了有前景的结果,这表明有可能在Reddit上检测到亲密伴侣暴力受害者,并为他们提供支持。