Ershad Marzieh, Rege Robert, Fey Ann Majewicz
Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:1829-1832. doi: 10.1109/EMBC.2018.8512593.
A gold standard in surgical skill rating and evaluation is direct observation, which a group of experts rate trainees based on a likert scale, by observing their performance during a surgical task. This method is time and resource intensive. To alleviate this burden, many studies have focused on automatic surgical skill assessment; however, the metrics suggested by the literature for automatic evaluation do not capture the stylistic behavior of the user. In addition very few studies focus on automatic rating of surgical skills based on available likert scales. In a previous study we presented a stylistic behavior lexicon for surgical skill. In this study we evaluate the lexicon's ability to automatically rate robotic surgical skill, based on the 6 domains in the Global Evaluative Assessment of Robotic Skills (GEARS). 14 subjects of different skill levels performed two surgical tasks on da Vinci surgical simulator. Different measurements were acquired as subjects performed the tasks, including limb (hand and arm) kinematics and joint (shoulder, elbow, wrist) positions. Posture videos of the subjects performing the task, as well as videos of the task being performed were viewed and rated by faculty experts based on the 6 domains in GEARS. The paired videos were also rated via crowd-sourcing based on our stylistic behavior lexicon. Two separate regression learner models, one using the sensor measurements and the other using crowd ratings for our proposed lexicon, were trained for each domain in GEARS. The results indicate that the scores predicted from both prediction models are in agreement with the gold standard faculty ratings.
手术技能评级与评估的金标准是直接观察,即一组专家通过观察学员在手术任务中的表现,根据李克特量表对其进行评分。这种方法耗费时间和资源。为减轻这一负担,许多研究都聚焦于手术技能的自动评估;然而,文献中提出的用于自动评估的指标并未捕捉到用户的风格行为。此外,很少有研究关注基于现有李克特量表的手术技能自动评级。在之前的一项研究中,我们提出了一个用于手术技能的风格行为词典。在本研究中,我们基于机器人技能全球评估(GEARS)的6个领域,评估该词典自动评级机器人手术技能的能力。14名不同技能水平的受试者在达芬奇手术模拟器上执行了两项手术任务。在受试者执行任务时获取了不同的测量数据,包括肢体(手和手臂)运动学和关节(肩部、肘部、腕部)位置。执行任务的受试者的姿势视频以及任务执行视频由教员专家根据GEARS的6个领域进行观看和评分。配对视频也通过众包,根据我们的风格行为词典进行评分。针对GEARS中的每个领域,训练了两个独立的回归学习模型,一个使用传感器测量数据,另一个使用针对我们提出的词典的众包评分。结果表明,两个预测模型预测的分数与金标准教员评分一致。