Suppr超能文献

临床文本注释——与时间成本相关的因素有哪些?

Clinical text annotation - what factors are associated with the cost of time?

作者信息

Wei Qiang, Franklin Amy, Cohen Trevor, Xu Hua

机构信息

School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.

出版信息

AMIA Annu Symp Proc. 2018 Dec 5;2018:1552-1560. eCollection 2018.

Abstract

Building high-quality annotated clinical corpora is necessary for developing statistical Natural Language Processing (NLP) models to unlock information embedded in clinical text, but it is also time consuming and expensive. Consequently, it important to identify factors that may affect annotation time, such as syntactic complexity of the text- to-be-annotated and the vagaries of individual user behavior. However, limited work has been done to understand annotation of clinical text. In this study, we aimed to investigate how factors inherent to the text affect annotation time for a named entity recognition (NER) task. We recruited 9 users to annotate a clinical corpus and recorded annotation time for each sample. Then we defined a set of factors that we hypothesized might affect annotation time, and fitted them into a linear regression model to predict annotation time. The linear regression model achieved an R of 0.611, and revealed eight time-associated factors, including characteristics of sentences, individual users, and annotation order with implications for the practice of annotation, and the development of cost models for active learning research.

摘要

构建高质量的带注释临床语料库对于开发统计自然语言处理(NLP)模型以挖掘临床文本中嵌入的信息是必要的,但这也既耗时又昂贵。因此,识别可能影响注释时间的因素很重要,比如待注释文本的句法复杂性以及个体用户行为的变幻莫测。然而,在理解临床文本注释方面所做的工作有限。在本研究中,我们旨在调查文本的内在因素如何影响命名实体识别(NER)任务的注释时间。我们招募了9名用户来注释一个临床语料库,并记录每个样本的注释时间。然后我们定义了一组我们假设可能影响注释时间的因素,并将它们纳入线性回归模型以预测注释时间。线性回归模型的R值为0.611,并揭示了八个与时间相关的因素,包括句子特征、个体用户以及注释顺序,这些因素对注释实践以及主动学习研究的成本模型开发具有启示意义。

相似文献

引用本文的文献

5
TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation.TAX-Corpus:用于结肠镜检查评估的基于分类法的注释
Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;2022:162-169. doi: 10.5220/0010876100003123.

本文引用的文献

1
Clinical information extraction applications: A literature review.临床信息提取应用:文献综述。
J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.
3
Spoken Language Derived Measures for Detecting Mild Cognitive Impairment.用于检测轻度认知障碍的口语衍生测量方法。
IEEE Trans Audio Speech Lang Process. 2011 Sep 1;19(7):2081-2090. doi: 10.1109/TASL.2011.2112351.
4
2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛:临床文本中的概念、断言和关系
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.
5
Extracting medication information from clinical text.从临床文本中提取药物信息。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):514-8. doi: 10.1136/jamia.2010.003947.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验