Dubin Ariel K, Smith Roger, Julian Danielle, Tanaka Alyssa, Mattingly Patricia
Department of Obstetrics and Gynecology, Columbia University Medical Center, New York, New York.
Florida Hospital Nicholson Center, Celebration, Florida.
J Minim Invasive Gynecol. 2017 Nov-Dec;24(7):1184-1189. doi: 10.1016/j.jmig.2017.07.019. Epub 2017 Jul 27.
To answer the question of whether there is a difference between robotic virtual reality simulator performance assessment and validated human reviewers. Current surgical education relies heavily on simulation. Several assessment tools are available to the trainee, including the actual robotic simulator assessment metrics and the Global Evaluative Assessment of Robotic Skills (GEARS) metrics, both of which have been independently validated. GEARS is a rating scale through which human evaluators can score trainees' performances on 6 domains: depth perception, bimanual dexterity, efficiency, force sensitivity, autonomy, and robotic control. Each domain is scored on a 5-point Likert scale with anchors. We used 2 common robotic simulators, the dV-Trainer (dVT; Mimic Technologies Inc., Seattle, WA) and the da Vinci Skills Simulator (dVSS; Intuitive Surgical, Sunnyvale, CA), to compare the performance metrics of robotic surgical simulators with the GEARS for a basic robotic task on each simulator.
A prospective single-blinded randomized study.
A surgical education and training center.
Surgeons and surgeons in training.
Demographic information was collected including sex, age, level of training, specialty, and previous surgical and simulator experience. Subjects performed 2 trials of ring and rail 1 (RR1) on each of the 2 simulators (dVSS and dVT) after undergoing randomization and warm-up exercises. The second RR1 trial simulator performance was recorded, and the deidentified videos were sent to human reviewers using GEARS. Eight different simulator assessment metrics were identified and paired with a similar performance metric in the GEARS tool. The GEARS evaluation scores and simulator assessment scores were paired and a Spearman rho calculated for their level of correlation.
Seventy-four subjects were enrolled in this randomized study with 9 subjects excluded for missing or incomplete data. There was a strong correlation between the GEARS score and the simulator metric score for time to complete versus efficiency, time to complete versus total score, economy of motion versus depth perception, and overall score versus total score with rho coefficients greater than or equal to 0.70; these were significant (p < .0001). Those with weak correlation (rho ≥0.30) were bimanual dexterity versus economy of motion, efficiency versus master workspace range, bimanual dexterity versus master workspace range, and robotic control versus instrument collisions.
On basic VR tasks, several simulator metrics are well matched with GEARS scores assigned by human reviewers, but others are not. Identifying these matches/mismatches can improve the training and assessment process when using robotic surgical simulators.
回答机器人虚拟现实模拟器性能评估与经验证的人类评估者之间是否存在差异的问题。当前的外科手术教育严重依赖模拟。有多种评估工具可供学员使用,包括实际的机器人模拟器评估指标和机器人技能全球评估(GEARS)指标,这两者均已独立验证。GEARS是一种评分量表,人类评估者可通过该量表在6个领域对学员的表现进行评分:深度感知、双手灵巧性、效率、力敏感度、自主性和机器人控制。每个领域根据带有锚定的5点李克特量表进行评分。我们使用了2种常见的机器人模拟器,即dV-Trainer(dVT;Mimic Technologies公司,华盛顿州西雅图)和达芬奇技能模拟器(dVSS;直观外科公司,加利福尼亚州森尼韦尔),来比较机器人手术模拟器在每个模拟器上执行基本机器人任务时的性能指标与GEARS评分。
前瞻性单盲随机研究。
一个外科手术教育和培训中心。
外科医生和接受培训的外科医生。
收集人口统计学信息,包括性别、年龄、培训水平、专业以及以前的手术和模拟器经验。受试者在随机分组和热身练习后,在2种模拟器(dVSS和dVT)上各进行2次环轨1(RR1)试验。记录第二次RR1试验的模拟器性能,并将去识别化的视频发送给使用GEARS的人类评估者。确定了8种不同的模拟器评估指标,并将其与GEARS工具中类似的性能指标配对。将GEARS评估分数和模拟器评估分数配对,并计算它们的相关性水平的斯皮尔曼等级相关系数。
74名受试者参与了这项随机研究,9名受试者因数据缺失或不完整而被排除。在完成时间与效率、完成时间与总分、动作经济性与深度感知以及总分与总分之间,GEARS评分与模拟器指标评分之间存在强相关性,等级相关系数大于或等于0.70;这些相关性具有统计学意义(p < 0.0001)。相关性较弱(等级相关系数≥0.30)的是双手灵巧性与动作经济性、效率与主工作空间范围、双手灵巧性与主工作空间范围以及机器人控制与器械碰撞。
在基本的虚拟现实任务中,一些模拟器指标与人类评估者分配的GEARS评分匹配良好,但其他指标则不然。识别这些匹配/不匹配情况可以改善使用机器人手术模拟器时的培训和评估过程。