Wac Marceli, Santos-Rodriguez Raul, McWilliams Chris, Bourdeaux Christopher
Faculty of Engineering, University of Bristol, Bristol, United Kingdom.
University Hospitals Bristol and Weston NHS Foundation Trust, Bristol, United Kingdom.
JMIR Hum Factors. 2025 Feb 5;12:e56880. doi: 10.2196/56880.
Increasing use of computational methods in health care provides opportunities to address previously unsolvable problems. Machine learning techniques applied to routinely collected data can enhance clinical tools and improve patient outcomes, but their effective deployment comes with significant challenges. While some tasks can be addressed by training machine learning models directly on the collected data, more complex problems require additional input in the form of data annotations. Data annotation is a complex and time-consuming problem that requires domain expertise and frequently, technical proficiency. With clinicians' time being an extremely limited resource, existing tools fail to provide an effective workflow for deployment in health care.
This paper investigates the approach of intensive care unit staff to the task of data annotation. Specifically, it aims to (1) understand how clinicians approach data annotation and (2) capture the requirements for a digital annotation tool for the health care setting.
We conducted an experimental activity involving annotation of the printed excerpts of real time-series admission data with 7 intensive care unit clinicians. Each participant annotated an identical set of admissions with the periods of weaning from mechanical ventilation during a single 45-minute workshop. Participants were observed during task completion and their actions were analyzed within Norman's Interaction Cycle model to identify the software requirements.
Clinicians followed a cyclic process of investigation, annotation, data reevaluation, and label refinement. Variety of techniques were used to investigate data and create annotations. We identified 11 requirements for the digital tool across 4 domains: annotation of individual admissions (n=5), semiautomated annotation (n=3), operational constraints (n=2), and use of labels in machine learning (n=1).
Effective data annotation in a clinical setting relies on flexibility in analysis and label creation and workflow continuity across multiple admissions. There is a need to ensure a seamless transition between data investigation, annotation, and refinement of the labels.
越来越多的计算方法应用于医疗保健领域,为解决以前无法解决的问题提供了机会。应用于常规收集数据的机器学习技术可以增强临床工具并改善患者结局,但其有效部署面临重大挑战。虽然一些任务可以通过直接对收集到的数据训练机器学习模型来解决,但更复杂的问题需要以数据注释的形式提供额外输入。数据注释是一个复杂且耗时的问题,需要领域专业知识,而且通常还需要技术熟练程度。由于临床医生的时间是极其有限的资源,现有工具未能提供在医疗保健中有效部署的工作流程。
本文研究重症监护病房工作人员处理数据注释任务的方法。具体而言,其旨在(1)了解临床医生如何进行数据注释,以及(2)获取针对医疗保健环境的数字注释工具的要求。
我们开展了一项实验活动,让7名重症监护病房临床医生对实时序列入院数据的打印摘录进行注释。在一个45分钟的研讨会上,每位参与者对一组相同的入院数据进行注释,标注出机械通气撤机阶段。在任务完成过程中观察参与者,并在诺曼交互循环模型内分析他们的行为,以确定软件需求。
临床医生遵循了一个循环过程,包括调查、注释、数据重新评估和标签细化。使用了多种技术来调查数据并创建注释。我们确定了数字工具在4个领域的11项要求:单个入院数据的注释(n = 5)、半自动注释(n = 3)、操作限制(n = 2)以及机器学习中标签的使用(n = 1)。
临床环境中的有效数据注释依赖于分析和标签创建的灵活性以及多个入院数据间工作流程的连续性。需要确保在数据调查、注释和标签细化之间实现无缝过渡。