一种用于检测归因于自杀情况的死亡调查记录中不一致性的自然语言处理方法。

A natural language processing approach to detect inconsistencies in death investigation notes attributing suicide circumstances.

作者信息

Wang Song, Zhou Yiliang, Han Ziqiang, Tao Cui, Xiao Yunyu, Ding Ying, Ghosh Joydeep, Peng Yifan

机构信息

Cockrell School of Engineering, The University of Texas at Austin, Austin, TX, USA.

Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.

出版信息

Commun Med (Lond). 2024 Oct 14;4(1):199. doi: 10.1038/s43856-024-00631-7.

DOI:10.1038/s43856-024-00631-7

PMID:39397053

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11471859/

Abstract

BACKGROUND

Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causing factors of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-circumstance attributions.

METHODS

We present an empirical Natural Language Processing (NLP) approach to detect annotation inconsistencies and adopt a cross-validation-like paradigm to identify possible label errors. We analyzed 267,804 suicide death incidents between 2003 and 2020 from the NVDRS. We measured annotation inconsistency by the degree of changes in the F-1 score.

RESULTS

Our results show that incorporating the target state's data into training the suicide-circumstance classifier brings an increase of 5.4% to the F-1 score on the target state's test set and a decrease of 1.1% on other states' test set.

CONCLUSIONS

To conclude, we present an NLP framework to detect the annotation inconsistencies, show the effectiveness of identifying and rectifying possible label errors, and eventually propose an improvement solution to improve the coding consistency of human annotators.

摘要