Suppr超能文献

利用警方和健康记录中的文本挖掘、数据关联和深度学习来预测家庭及家庭暴力中的未来犯罪行为。

Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence.

作者信息

Karystianis George, Cabral Rina Carines, Han Soyeon Caren, Poon Josiah, Butler Tony

机构信息

School of Population Health, University of New South Wales, Sydney, NSW, Australia.

School of Computer Science, University of Sydney, Sydney, NSW, Australia.

出版信息

Front Digit Health. 2021 Feb 17;3:602683. doi: 10.3389/fdgth.2021.602683. eCollection 2021.

Abstract

Family and Domestic violence (FDV) is a global problem with significant social, economic, and health consequences for victims including increased health care costs, mental trauma, and social stigmatization. In Australia, the estimated annual cost of FDV is $22 billion, with one woman being murdered by a current or former partner every week. Despite this, tools that can predict future FDV based on the features of the person of interest (POI) and victim are lacking. The New South Wales Police Force attends thousands of FDV events each year and records details as fixed fields (e.g., demographic information for individuals involved in the event) and as text narratives which describe abuse types, victim injuries, threats, including the mental health status for POIs and victims. This information within the narratives is mostly untapped for research and reporting purposes. After applying a text mining methodology to extract information from 492,393 FDV event narratives (abuse types, victim injuries, mental illness mentions), we linked these characteristics with the respective fixed fields and with actual mental health diagnoses obtained from the NSW Ministry of Health for the same cohort to form a comprehensive FDV dataset. These data were input into five deep learning models (MLP, LSTM, Bi-LSTM, Bi-GRU, BERT) to predict three FDV offense types ("hands-on," "hands-off," "Apprehended Domestic Violence Order (ADVO) breach"). The transformer model with BERT embeddings returned the best performance (69.00% accuracy; 66.76% ROC) for "ADVO breach" in a multilabel classification setup while the binary classification setup generated similar results. "Hands-off" offenses proved the hardest offense type to predict (60.72% accuracy; 57.86% ROC using BERT) but showed potential to improve with fine-tuning of binary classification setups. "Hands-on" offenses benefitted least from the contextual information gained through BERT embeddings in which MLP with categorical embeddings outperformed it in three out of four metrics (65.95% accuracy; 78.03% F1-score; 70.00% precision). The encouraging results indicate that future FDV offenses can be predicted using deep learning on a large corpus of police and health data. Incorporating additional data sources will likely increase the performance which can assist those working on FDV and law enforcement to improve outcomes and better manage FDV events.

摘要

家庭与家庭暴力(FDV)是一个全球性问题,会给受害者带来重大的社会、经济和健康后果,包括医疗保健成本增加、精神创伤和社会污名化。在澳大利亚,FDV的估计年度成本为220亿澳元,每周有一名女性被现任或前任伴侣谋杀。尽管如此,缺乏能够根据相关人员(POI)和受害者的特征预测未来FDV的工具。新南威尔士州警察部队每年处理数千起FDV事件,并将细节记录为固定字段(例如,事件相关人员的人口统计信息)以及描述虐待类型、受害者受伤情况、威胁(包括POI和受害者的心理健康状况)的文本叙述。这些叙述中的信息大多未被用于研究和报告目的。在应用文本挖掘方法从492393起FDV事件叙述中提取信息(虐待类型、受害者受伤情况、提及的精神疾病)后,我们将这些特征与各自的固定字段以及从新南威尔士州卫生部获取的同一队列的实际心理健康诊断结果相联系,以形成一个全面的FDV数据集。这些数据被输入到五个深度学习模型(MLP、LSTM、Bi-LSTM、Bi-GRU、BERT)中,以预测三种FDV犯罪类型(“动手”、“非动手 ”、“违反家庭暴力逮捕令(ADVO)”)。在多标签分类设置中,具有BERT嵌入的Transformer模型在 “违反ADVO” 方面表现最佳(准确率69.00%;ROC为66.76%),而二元分类设置产生了类似的结果。事实证明,“非动手”犯罪是最难预测的犯罪类型(准确率60.72%;使用BERT时ROC为57.86%),但通过二元分类设置的微调显示出有改进的潜力。“动手”犯罪从通过BERT嵌入获得的上下文信息中受益最少,在四个指标中的三个指标上(准确率65.95%;F1分数78.03%;精确率70.00%),具有类别嵌入的MLP表现优于它。这些令人鼓舞的结果表明,未来可以通过对大量警察和健康数据进行深度学习来预测FDV犯罪。纳入更多数据源可能会提高性能,这有助于从事FDV工作的人员和执法部门改善结果并更好地管理FDV事件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1e6/8521947/c540e77679f0/fdgth-03-602683-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验