Al-Garadi Mohammed Ali, Kim Sangmi, Guo Yuting, Warren Elise, Yang Yuan-Chi, Lakamana Sahithi, Sarker Abeed
Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States.
School of Nursing, Emory University, Atlanta, GA, United States.
Array (N Y). 2022 Sep;15. doi: 10.1016/j.array.2022.100217. Epub 2022 Jul 20.
Intimate partner violence (IPV) is a preventable public health problem that affects millions of people worldwide. Approximately one in four women are estimated to be or have been victims of severe violence at some point in their lives, irrespective of age, ethnicity, and economic status. Victims often report IPV experiences on social media, and automatic detection of such reports via machine learning may enable improved surveillance and targeted distribution of support and/or interventions for those in need. However, no artificial intelligence systems for automatic detection currently exists, and we attempted to address this research gap. We collected posts from Twitter using a list of IPV-related keywords, manually reviewed subsets of retrieved posts, and prepared annotation guidelines to categorize tweets into IPV-report or non-IPV-report. We annotated 6,348 tweets in total, with the inter-annotator agreement (IAA) of 0.86 (Cohen's kappa) among 1,834 double-annotated tweets. The class distribution in the annotated dataset was highly imbalanced, with only 668 posts (~11%) labeled as IPV-report. We then developed an effective natural language processing model to identify IPV-reporting tweets automatically. The developed model achieved classification F-scores of 0.76 for the IPV-report class and 0.97 for the non-IPV-report class. We conducted post-classification analyses to determine the causes of system errors and to ensure that the system did not exhibit biases in its decision making, particularly with respect to race and gender. Our automatic model can be an essential component for a proactive social media-based intervention and support framework, while also aiding population-level surveillance and large-scale cohort studies.
亲密伴侣暴力(IPV)是一个可预防的公共卫生问题,影响着全球数百万人。据估计,约四分之一的女性在其生命中的某个时刻曾是严重暴力的受害者,无论年龄、种族和经济状况如何。受害者经常在社交媒体上报告亲密伴侣暴力经历,通过机器学习自动检测此类报告可能有助于改善监测,并为有需要的人提供有针对性的支持和/或干预。然而,目前尚不存在用于自动检测的人工智能系统,我们试图填补这一研究空白。我们使用与亲密伴侣暴力相关的关键词列表从推特上收集帖子,手动审查检索到的帖子子集,并制定注释指南,将推文分类为亲密伴侣暴力报告或非亲密伴侣暴力报告。我们总共注释了6348条推文,在1834条经过双重注释的推文中,注释者间一致性(IAA)为0.86(科恩kappa系数)。注释数据集中的类别分布高度不均衡,只有668条帖子(约11%)被标记为亲密伴侣暴力报告。然后,我们开发了一个有效的自然语言处理模型来自动识别报告亲密伴侣暴力的推文。所开发的模型对亲密伴侣暴力报告类别的分类F分数为0.76,对非亲密伴侣暴力报告类别的分类F分数为0.97。我们进行了分类后分析,以确定系统错误的原因,并确保系统在决策过程中没有表现出偏差,特别是在种族和性别方面。我们的自动模型可以成为基于社交媒体的主动干预和支持框架的重要组成部分,同时也有助于进行人群层面的监测和大规模队列研究。