Child and Adolescent Data Lab, University of Michigan, School of Social Work, 1080 S University Ave, Ann Arbor, MI, 48109, United States.
Indiana University School of Social Work, 902 West New York Street Indianapolis, Indiana, 46202, United States.
Child Abuse Negl. 2019 Dec;98:104180. doi: 10.1016/j.chiabu.2019.104180. Epub 2019 Sep 12.
State child welfare agencies collect, store, and manage vast amounts of data. However, they often do not have the right data, or the data is problematic or difficult to inform strategies to improve services and system processes. Considerable resources are required to read and code these text data. Data science and text mining offer potentially efficient and cost-effective strategies for maximizing the value of these data.
The current study tests the feasibility of using text mining for extracting information from unstructured text to better understand substance-related problems among families investigated for abuse or neglect.
A state child welfare agency provided written summaries from investigations of child abuse and neglect. Expert human reviewers coded 2956 investigation summaries based on whether the caseworker observed a substance-related problem. These coded documents were used to develop, train, and validate computer models that could perform the coding on an automated basis.
A set of computer models achieved greater than 90% accuracy when judged against expert human reviewers. Fleiss kappa estimates among computer models and expert human reviewers exceeded .80, indicating that expert human reviewer ratings are exchangeable with the computer models.
These results provide compelling evidence that text mining procedures can be a cost-effective and efficient solution for extracting meaningful insights from unstructured text data. Additional research is necessary to understand how to extract the actionable insights from these under-utilized stores of data in child welfare.
州儿童福利机构收集、存储和管理大量数据。然而,他们通常没有正确的数据,或者数据存在问题或难以告知战略以改善服务和系统流程。阅读和编写这些文本数据需要大量资源。数据科学和文本挖掘为最大限度地利用这些数据提供了潜在的高效和具有成本效益的策略。
本研究测试了使用文本挖掘从非结构化文本中提取信息以更好地理解因虐待或忽视而接受调查的家庭中与物质相关的问题的可行性。
州儿童福利机构提供了对虐待和忽视儿童的调查书面摘要。专家人工审查员根据观察到的与物质相关的问题对 2956 份调查摘要进行了编码。这些编码文件用于开发、培训和验证计算机模型,这些模型可以自动执行编码。
当与专家人工审查员进行比较时,一组计算机模型的准确率超过 90%。计算机模型和专家人工审查员之间的 Fleiss kappa 估计值超过.80,表明专家人工审查员的评分可以与计算机模型互换。
这些结果提供了令人信服的证据,表明文本挖掘程序可以成为从非结构化文本数据中提取有意义见解的具有成本效益和高效的解决方案。需要进一步研究如何从儿童福利中这些未充分利用的数据存储中提取可操作的见解。