Wang Song, Zhou Yiliang, Han Ziqiang, Tao Cui, Xiao Yunyu, Ding Ying, Ghosh Joydeep, Peng Yifan
Cockrell School of Engineering, The University of Texas at Austin, Austin, TX, USA.
Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
Commun Med (Lond). 2024 Oct 14;4(1):199. doi: 10.1038/s43856-024-00631-7.
Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causing factors of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-circumstance attributions.
We present an empirical Natural Language Processing (NLP) approach to detect annotation inconsistencies and adopt a cross-validation-like paradigm to identify possible label errors. We analyzed 267,804 suicide death incidents between 2003 and 2020 from the NVDRS. We measured annotation inconsistency by the degree of changes in the F-1 score.
Our results show that incorporating the target state's data into training the suicide-circumstance classifier brings an increase of 5.4% to the F-1 score on the target state's test set and a decrease of 1.1% on other states' test set.
To conclude, we present an NLP framework to detect the annotation inconsistencies, show the effectiveness of identifying and rectifying possible label errors, and eventually propose an improvement solution to improve the coding consistency of human annotators.
数据准确性对于科学研究和政策制定至关重要。国家暴力死亡报告系统(NVDRS)的数据被广泛用于发现死亡模式和成因。近期研究表明NVDRS内部存在注释不一致的情况,以及这对错误的自杀情况归因可能产生的影响。
我们提出一种实证自然语言处理(NLP)方法来检测注释不一致性,并采用类似交叉验证的范式来识别可能的标签错误。我们分析了NVDRS在2003年至2020年期间的267,804起自杀死亡事件。我们通过F-1分数的变化程度来衡量注释不一致性。
我们的结果表明,将目标州的数据纳入自杀情况分类器的训练中,会使目标州测试集的F-1分数提高5.4%,而其他州测试集的F-1分数则下降1.1%。
总之,我们提出了一个NLP框架来检测注释不一致性,展示识别和纠正可能的标签错误的有效性,并最终提出一种改进解决方案,以提高人工注释者的编码一致性。