Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.
J Am Med Inform Assoc. 2014 May-Jun;21(3):406-13. doi: 10.1136/amiajnl-2013-001837. Epub 2013 Sep 3.
OBJECTIVE: To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized. METHODS: To build the gold standard, 1400 clinical trial announcements from the clinicaltrials.gov website were randomly selected and double annotated for diagnoses, signs, symptoms, Unified Medical Language System (UMLS) Concept Unique Identifiers, and SNOMED CT codes. We used two dictionary-based methods to pre-annotate the text. We evaluated the annotation time and potential bias through F-measures and ANOVA tests and implemented Bonferroni correction. RESULTS: Time savings ranged from 13.85% to 21.5% per entity. Inter-annotator agreement (IAA) ranged from 93.4% to 95.5%. There was no statistically significant difference for IAA and annotator performance in pre-annotations. CONCLUSIONS: On every experiment pair, the annotator with the pre-annotated text needed less time to annotate than the annotator with non-labeled text. The time savings were statistically significant. Moreover, the pre-annotation did not reduce the IAA or annotator performance. Dictionary-based pre-annotation is a feasible and practical method to reduce the cost of annotation of clinical named entity recognition in the eligibility sections of clinical trial announcements without introducing bias in the annotation process.
目的:呈现一系列实验:(1)评估预注释对临床试验公告手动注释速度的影响;(2)测试如果使用预注释是否存在潜在偏差。
方法:为了构建黄金标准,从 clinicaltrials.gov 网站上随机选择了 1400 条临床试验公告,并对其进行了双重注释,以确定诊断、体征、症状、统一医学语言系统(UMLS)概念唯一标识符和 SNOMED CT 代码。我们使用了两种基于字典的方法来进行预注释。我们通过 F 度量和方差分析测试以及实施 Bonferroni 校正来评估注释时间和潜在偏差。
结果:每个实体的节省时间范围从 13.85%到 21.5%。注释者间一致性(IAA)范围从 93.4%到 95.5%。IAA 和预注释注释者的表现之间没有统计学上的显著差异。
结论:在每对实验中,使用预注释文本的注释者所需的注释时间都比使用未标记文本的注释者短。节省的时间具有统计学意义。此外,预注释并没有降低 IAA 或注释者的表现。基于字典的预注释是一种可行且实用的方法,可以在不引入注释过程偏差的情况下,降低临床试验公告资格部分的临床命名实体识别的注释成本。
J Med Internet Res. 2013-4-2
J Am Med Inform Assoc. 2015-9
BMC Med Inform Decis Mak. 2021-2-22
BMC Bioinformatics. 2008-11-19
Int J Med Inform. 2018-12-31
J Biomed Inform. 2014-2
AMIA Annu Symp Proc. 2012
J Am Med Inform Assoc. 2014-1-15
J Biomed Semantics. 2018-1-10
J Am Med Inform Assoc. 2024-11-1
Graefes Arch Clin Exp Ophthalmol. 2023-11
J Biomed Semantics. 2023-2-2
Sci Rep. 2021-10-7
BMC Med Inform Decis Mak. 2021-2-22
J Am Med Inform Assoc. 2013-1-25
Bioinformatics. 2012-7-12
J Biomed Inform. 2010-11-20
Stud Health Technol Inform. 2010
Bioinformatics. 2005-7-15
J Am Med Inform Assoc. 2005
BMJ. 1995-1-21