注释者间一致性与机器性能上限：来自生物医学自然语言处理的证据

Inter-Annotator Agreement and the Upper Limit on Machine Performance: Evidence from Biomedical Natural Language Processing.

作者信息

Boguslav Mayla, Cohen Kevin Bretonnel

机构信息

Computational Bioscience Program, University Colorado School of Medicine, Aurora, CO, USA.

出版信息

Stud Health Technol Inform. 2017;245:298-302.

PMID:29295103

Abstract

Human-annotated data is a fundamental part of natural language processing system development and evaluation. The quality of that data is typically assessed by calculating the agreement between the annotators. It is widely assumed that this agreement between annotators is the upper limit on system performance in natural language processing: if humans can't agree with each other about the classification more than some percentage of the time, we don't expect a computer to do any better. We trace the logical positivist roots of the motivation for measuring inter-annotator agreement, demonstrate the prevalence of the widely-held assumption about the relationship between inter-annotator agreement and system performance, and present data that suggest that inter-annotator agreement is not, in fact, an upper bound on language processing system performance.

摘要

人工标注的数据是自然语言处理系统开发和评估的基本组成部分。该数据的质量通常通过计算标注者之间的一致性来评估。人们普遍认为，标注者之间的这种一致性是自然语言处理系统性能的上限：如果人类在超过一定百分比的时间内不能就分类达成一致，我们就不期望计算机能做得更好。我们追溯了测量标注者间一致性动机的逻辑实证主义根源，证明了关于标注者间一致性与系统性能关系的广泛假设的普遍性，并给出数据表明，事实上，标注者间一致性并非语言处理系统性能的上限。

相似文献

Inter-Annotator Agreement and the Upper Limit on Machine Performance: Evidence from Biomedical Natural Language Processing.注释者间一致性与机器性能上限：来自生物医学自然语言处理的证据

Stud Health Technol Inform. 2017;245:298-302.

RysannMD: A biomedical semantic annotator balancing speed and accuracy.RysannMD：一款兼顾速度与准确性的生物医学语义注释工具。

J Biomed Inform. 2017 Jul;71:91-109. doi: 10.1016/j.jbi.2017.05.016. Epub 2017 May 26.

Community annotation experiment for ground truth generation for the i2b2 medication challenge.社区注释实验，为 i2b2 药物挑战赛生成真实数据。

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):519-23. doi: 10.1136/jamia.2010.004200.

A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.用于生物医学概念识别的多语言金标准语料库：Mantra GSC。

J Am Med Inform Assoc. 2015 Sep;22(5):948-56. doi: 10.1093/jamia/ocv037. Epub 2015 May 6.

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.构建中文临床文本的综合句法和语义语料库。

J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.

Finding Cervical Cancer Symptoms in Swedish Clinical Text using a Machine Learning Approach and NegEx.使用机器学习方法和NegEx在瑞典语临床文本中发现宫颈癌症状

AMIA Annu Symp Proc. 2015 Nov 5;2015:1296-305. eCollection 2015.

Regular expression-based learning to extract bodyweight values from clinical notes.基于正则表达式的学习方法，用于从临床记录中提取体重值。

J Biomed Inform. 2015 Apr;54:186-90. doi: 10.1016/j.jbi.2015.02.009. Epub 2015 Mar 5.

Semantic Relations in Compound Nouns: Perspectives from Inter-Annotator Agreement.复合名词中的语义关系：来自注释者间一致性的视角

Stud Health Technol Inform. 2017;245:644-648.

A natural language processing pipeline for pairing measurements uniquely across free-text CT reports.一种用于在自由文本CT报告中唯一配对测量值的自然语言处理管道。

J Biomed Inform. 2015 Feb;53:36-48. doi: 10.1016/j.jbi.2014.08.015. Epub 2014 Sep 6.

Generalizability of Readability Models for Medical Terms.

Stud Health Technol Inform. 2019 Aug 21;264:1327-1331. doi: 10.3233/SHTI190442.

引用本文的文献

Enhance health evidence quality in classification tasks: A triangulation approach utilizing case-based reasoning and process features.在分类任务中提高健康证据质量：一种利用基于案例推理和过程特征的三角测量方法。

Digit Health. 2025 Jan 17;11:20552076251314097. doi: 10.1177/20552076251314097. eCollection 2025 Jan-Dec.

Identifying and classifying goals for scientific knowledge.识别和分类科学知识的目标。

Bioinform Adv. 2021 Jul 28;1(1):vbab012. doi: 10.1093/bioadv/vbab012. eCollection 2021.

Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology.评估临床自然语言处理系统的性能：一种评估方法的开发

JMIR Med Inform. 2021 Jul 23;9(7):e20492. doi: 10.2196/20492.

Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine.基于中医临床记录构建细粒度实体识别语料库。

BMC Med Inform Decis Mak. 2020 Apr 6;20(1):64. doi: 10.1186/s12911-020-1079-2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

注释者间一致性与机器性能上限：来自生物医学自然语言处理的证据

Inter-Annotator Agreement and the Upper Limit on Machine Performance: Evidence from Biomedical Natural Language Processing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献