Zhang Wenjin, Li Keyi, Yang Sen, Yuan Sifan, Marsic Ivan, Sippel Genevieve J, Kim Mary S, Burd Randall S
Rutgers University.
Waymo.
Conf Comput Vis Pattern Recognit Workshops. 2024 Jun;2024:4950-4958. doi: 10.1109/cvprw63382.2024.00500. Epub 2024 Sep 27.
Trauma is a leading cause of mortality worldwide, with about 20% of these deaths being preventable. Most of these preventable deaths result from errors during the initial resuscitation of injured patients. Decision support has been evaluated as an approach to support teams during this phase to reduce errors. Existing systems require manual data entry and monitoring, which makes tasks challenging to accomplish in a time-critical setting. This paper identified the specific challenges of achieving effective decision support in trauma resuscitation based on computer vision techniques, including complex backgrounds, crowded scenes, fine-grained activities, and a scarcity of labeled data. To address the first three challenges, the proposed system involved an actor tracker that identifies individuals, allowing the system to focus on actor-specific features. Video Masked Autoencoder (Video-MAE) was used to overcome the issue of insufficient labeled data. This approach enables self-supervised learning using unlabeled video content, improving feature representation for medical activities. For more reliable performance, an ensemble fusion method was introduced. This technique combines predictions from consecutive video clips and different actors. Our method outperformed existing approaches in identifying fine-grained activities, providing a solution for activity recognition in trauma resuscitation and similar complex domains.
创伤是全球范围内主要的死亡原因,其中约20%的死亡是可预防的。这些可预防的死亡大多是由于受伤患者初始复苏过程中的失误导致的。决策支持已被评估为在这一阶段支持团队以减少失误的一种方法。现有系统需要手动输入和监测数据,这使得在时间紧迫的情况下完成任务具有挑战性。本文确定了基于计算机视觉技术在创伤复苏中实现有效决策支持的具体挑战,包括复杂背景、拥挤场景、细粒度活动以及标记数据稀缺等问题。为应对前三个挑战,所提出的系统包含一个能够识别个体的行为跟踪器(actor tracker),使系统能够专注于特定个体的特征。视频掩码自动编码器(Video-MAE)被用于克服标记数据不足的问题。这种方法能够利用未标记的视频内容进行自监督学习,改善对医疗活动的特征表示。为了获得更可靠的性能,引入了一种集成融合方法。该技术结合了连续视频片段和不同个体的预测结果。我们的方法在识别细粒度活动方面优于现有方法,为创伤复苏及类似复杂领域的活动识别提供了解决方案。