University of Michigan, Ann Arbor, MI, USA.
Measures for Justice, Rochester, NY, USA.
Sci Adv. 2023 Mar 3;9(9):eabq8123. doi: 10.1126/sciadv.abq8123.
Researchers working with administrative crime data often must classify offense narratives into a common scheme for analysis purposes. No comprehensive standard currently exists, nor is there a mapping tool to transform raw descriptions into offense types. This paper introduces a new schema, the Uniform Crime Classification Standard (UCCS), and the Text-based Offense Classification (TOC) tool to address these shortcomings. The UCCS schema draws from existing efforts, aiming to better reflect offense severity and improve type disambiguation. The TOC tool is a machine learning algorithm that uses a hierarchical, multilayer perceptron classification framework, built on 313,209 hand-coded offense descriptions from 24 states, to translate raw descriptions into UCCS codes. We test how variations in data processing and modeling approaches affect recall, precision, and F1 scores to assess their relative influence on model performance. The code scheme and classification tool are collaborations between Measures for Justice and the Criminal Justice Administrative Records System.
研究人员在处理行政犯罪数据时,经常需要将犯罪叙述分类为通用方案,以便进行分析。目前没有全面的标准,也没有将原始描述转换为犯罪类型的映射工具。本文介绍了一种新的方案,即统一犯罪分类标准(UCCS)和基于文本的犯罪分类(TOC)工具,以解决这些问题。UCCS 方案借鉴了现有成果,旨在更好地反映犯罪严重程度并提高类型的明确性。TOC 工具是一种机器学习算法,它使用分层多层感知器分类框架,基于来自 24 个州的 313,209 个手工编码的犯罪描述,将原始描述转换为 UCCS 代码。我们测试了数据处理和建模方法的变化如何影响召回率、精度和 F1 分数,以评估它们对模型性能的相对影响。该代码方案和分类工具是 Measures for Justice 与刑事司法行政记录系统之间的合作成果。