文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估预标注对标注速度和潜在偏差的影响:临床试验公告中临床命名实体识别的自然语言处理金标准开发。

Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements.

机构信息

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

出版信息

J Am Med Inform Assoc. 2014 May-Jun;21(3):406-13. doi: 10.1136/amiajnl-2013-001837. Epub 2013 Sep 3.


DOI:10.1136/amiajnl-2013-001837
PMID:24001514
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3994857/
Abstract

OBJECTIVE: To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized. METHODS: To build the gold standard, 1400 clinical trial announcements from the clinicaltrials.gov website were randomly selected and double annotated for diagnoses, signs, symptoms, Unified Medical Language System (UMLS) Concept Unique Identifiers, and SNOMED CT codes. We used two dictionary-based methods to pre-annotate the text. We evaluated the annotation time and potential bias through F-measures and ANOVA tests and implemented Bonferroni correction. RESULTS: Time savings ranged from 13.85% to 21.5% per entity. Inter-annotator agreement (IAA) ranged from 93.4% to 95.5%. There was no statistically significant difference for IAA and annotator performance in pre-annotations. CONCLUSIONS: On every experiment pair, the annotator with the pre-annotated text needed less time to annotate than the annotator with non-labeled text. The time savings were statistically significant. Moreover, the pre-annotation did not reduce the IAA or annotator performance. Dictionary-based pre-annotation is a feasible and practical method to reduce the cost of annotation of clinical named entity recognition in the eligibility sections of clinical trial announcements without introducing bias in the annotation process.

摘要

目的:呈现一系列实验:(1)评估预注释对临床试验公告手动注释速度的影响;(2)测试如果使用预注释是否存在潜在偏差。

方法:为了构建黄金标准,从 clinicaltrials.gov 网站上随机选择了 1400 条临床试验公告,并对其进行了双重注释,以确定诊断、体征、症状、统一医学语言系统(UMLS)概念唯一标识符和 SNOMED CT 代码。我们使用了两种基于字典的方法来进行预注释。我们通过 F 度量和方差分析测试以及实施 Bonferroni 校正来评估注释时间和潜在偏差。

结果:每个实体的节省时间范围从 13.85%到 21.5%。注释者间一致性(IAA)范围从 93.4%到 95.5%。IAA 和预注释注释者的表现之间没有统计学上的显著差异。

结论:在每对实验中,使用预注释文本的注释者所需的注释时间都比使用未标记文本的注释者短。节省的时间具有统计学意义。此外,预注释并没有降低 IAA 或注释者的表现。基于字典的预注释是一种可行且实用的方法,可以在不引入注释过程偏差的情况下,降低临床试验公告资格部分的临床命名实体识别的注释成本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a5c/3994857/0c5b8890dcfc/amiajnl-2013-001837f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a5c/3994857/c6076cecfbfe/amiajnl-2013-001837f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a5c/3994857/dc71c916de56/amiajnl-2013-001837f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a5c/3994857/4700f2b87ff6/amiajnl-2013-001837f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a5c/3994857/0c5b8890dcfc/amiajnl-2013-001837f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a5c/3994857/c6076cecfbfe/amiajnl-2013-001837f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a5c/3994857/dc71c916de56/amiajnl-2013-001837f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a5c/3994857/4700f2b87ff6/amiajnl-2013-001837f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a5c/3994857/0c5b8890dcfc/amiajnl-2013-001837f04.jpg

相似文献

[1]
Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements.

J Am Med Inform Assoc. 2013-9-3

[2]
Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing.

J Med Internet Res. 2013-4-2

[3]
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.

J Am Med Inform Assoc. 2015-9

[4]
A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine.

BMC Med Inform Decis Mak. 2021-2-22

[5]
Accelerating the annotation of sparse named entities by dynamic sentence selection.

BMC Bioinformatics. 2008-11-19

[6]
Quantitative analysis of manual annotation of clinical text samples.

Int J Med Inform. 2018-12-31

[7]
NCBI disease corpus: a resource for disease name recognition and concept normalization.

J Biomed Inform. 2014-2

[8]
Building gold standard corpora for medical natural language processing tasks.

AMIA Annu Symp Proc. 2012

[9]
Assisted annotation of medical free text using RapTAT.

J Am Med Inform Assoc. 2014-1-15

[10]
CUILESS2016: a clinical corpus applying compositional normalization of text mentions.

J Biomed Semantics. 2018-1-10

引用本文的文献

[1]
A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case.

Sci Data. 2025-1-29

[2]
CACER: Clinical concept Annotations for Cancer Events and Relations.

J Am Med Inform Assoc. 2024-11-1

[3]
Utilizing active learning strategies in machine-assisted annotation for clinical named entity recognition: a comprehensive analysis considering annotation costs and target effectiveness.

J Am Med Inform Assoc. 2024-11-1

[4]
A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry.

Graefes Arch Clin Exp Ophthalmol. 2023-11

[5]
MedLexSp - a medical lexicon for Spanish medical natural language processing.

J Biomed Semantics. 2023-2-2

[6]
Adverse drug event detection using natural language processing: A scoping review of supervised learning methods.

PLoS One. 2023

[7]
The OpenDeID corpus for patient de-identification.

Sci Rep. 2021-10-7

[8]
Identification of social determinants of health using multi-label classification of electronic health record clinical notes.

JAMIA Open. 2021-2-9

[9]
A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine.

BMC Med Inform Decis Mak. 2021-2-22

[10]
Adverse drug event presentation and tracking (ADEPT): semiautomated, high throughput pharmacovigilance using real-world data.

JAMIA Open. 2020-8-31

本文引用的文献

[1]
Towards comprehensive syntactic and semantic annotations of the clinical narrative.

J Am Med Inform Assoc. 2013-1-25

[2]
A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction.

J Am Med Inform Assoc. 2012-12-25

[3]
MyMiner: a web application for computer-assisted biocuration and text annotation.

Bioinformatics. 2012-7-12

[4]
Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.

J Biomed Inform. 2010-11-20

[5]
Extraction of adverse drug effects from clinical records.

Stud Health Technol Inform. 2010

[6]
Evaluation of training with an annotation schema for manual annotation of clinical conditions from emergency department reports.

Int J Med Inform. 2008-2

[7]
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text.

Bioinformatics. 2005-7-15

[8]
Agreement, the f-measure, and reliability in information retrieval.

J Am Med Inform Assoc. 2005

[9]
Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Proc AMIA Symp. 2001

[10]
Multiple significance tests: the Bonferroni method.

BMJ. 1995-1-21

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索