一项使用成对排名进行众包的节段级手术技能评估研究。

A study of crowdsourced segment-level surgical skill assessment using pairwise rankings.

作者信息

Malpani Anand, Vedula S Swaroop, Chen Chi Chiung Grace, Hager Gregory D

机构信息

Johns Hopkins University, 3400 N Charles St, Hackerman Hall Room 200, Baltimore, MD, USA,

出版信息

Int J Comput Assist Radiol Surg. 2015 Sep;10(9):1435-47. doi: 10.1007/s11548-015-1238-6. Epub 2015 Jun 30.

DOI:10.1007/s11548-015-1238-6

PMID:26133652

Abstract

PURPOSE

Currently available methods for surgical skills assessment are either subjective or only provide global evaluations for the overall task. Such global evaluations do not inform trainees about where in the task they need to perform better. In this study, we investigated the reliability and validity of a framework to generate objective skill assessments for segments within a task, and compared assessments from our framework using crowdsourced segment ratings from surgically untrained individuals and expert surgeons against manually assigned global rating scores.

METHODS

Our framework includes (1) a binary classifier trained to generate preferences for pairs of task segments (i.e., given a pair of segments, specification of which one was performed better), (2) computing segment-level percentile scores based on the preferences, and (3) predicting task-level scores using the segment-level scores. We conducted a crowdsourcing user study to obtain manual preferences for segments within a suturing and knot-tying task from a crowd of surgically untrained individuals and a group of experts. We analyzed the inter-rater reliability of preferences obtained from the crowd and experts, and investigated the validity of task-level scores obtained using our framework. In addition, we compared accuracy of the crowd and expert preference classifiers, as well as the segment- and task-level scores obtained from the classifiers.

RESULTS

We observed moderate inter-rater reliability within the crowd (Fleiss' kappa, κ = 0.41) and experts (κ = 0.55). For both the crowd and experts, the accuracy of an automated classifier trained using all the task segments was above par as compared to the inter-rater agreement [crowd classifier 85 % (SE 2 %), expert classifier 89 % (SE 3 %)]. We predicted the overall global rating scores (GRS) for the task with a root-mean-squared error that was lower than one standard deviation of the ground-truth GRS. We observed a high correlation between segment-level scores (ρ ≥ 0.86) obtained using the crowd and expert preference classifiers. The task-level scores obtained using the crowd and expert preference classifier were also highly correlated with each other (ρ ≥ 0.84), and statistically equivalent within a margin of two points (for a score ranging from 6 to 30). Our analyses, however, did not demonstrate statistical significance in equivalence of accuracy between the crowd and expert classifiers within a 10 % margin.

CONCLUSIONS

Our framework implemented using crowdsourced pairwise comparisons leads to valid objective surgical skill assessment for segments within a task, and for the task overall. Crowdsourcing yields reliable pairwise comparisons of skill for segments within a task with high efficiency. Our framework may be deployed within surgical training programs for objective, automated, and standardized evaluation of technical skills.

摘要

目的

目前可用的手术技能评估方法要么是主观的，要么仅对整个任务提供整体评估。这种整体评估无法告知受训者在任务的哪个环节需要表现得更好。在本研究中，我们调查了一个为任务中的各个部分生成客观技能评估的框架的可靠性和有效性，并将我们框架中的评估结果与来自未经手术训练的个体和专家外科医生的众包部分评分以及手动分配的整体评分进行了比较。

方法

我们的框架包括：（1）一个二元分类器，经过训练以生成对任务部分对的偏好（即，给定一对部分，指明哪一个执行得更好）；（2）根据偏好计算部分级百分位数分数；（3）使用部分级分数预测任务级分数。我们进行了一项众包用户研究，以从未经手术训练的个体群体和一组专家那里获得缝合和打结任务中各部分的手动偏好。我们分析了从群体和专家那里获得的偏好的评分者间可靠性，并研究了使用我们的框架获得的任务级分数的有效性。此外，我们比较了群体和专家偏好分类器的准确性，以及从分类器获得的部分级和任务级分数。

结果

我们观察到群体内部（Fleiss'卡方系数，κ = 0.41）和专家内部（κ = 0.55）的评分者间可靠性为中等。对于群体和专家而言，与评分者间一致性相比，使用所有任务部分训练的自动分类器的准确性高于平均水平[群体分类器85%（标准误2%），专家分类器89%（标准误3%）]。我们预测任务的整体全局评分（GRS）的均方根误差低于真实GRS的一个标准差。我们观察到使用群体和专家偏好分类器获得的部分级分数之间具有高度相关性（ρ≥0.86）。使用群体和专家偏好分类器获得的任务级分数彼此之间也高度相关（ρ≥0.84），并且在两点的误差范围内在统计学上等效（分数范围为6至30）。然而，我们的分析并未在群体和专家分类器的准确性在10%误差范围内的等效性上显示出统计学显著性。

结论

我们使用众包成对比较实现的框架能够对任务中的各个部分以及整个任务进行有效的客观手术技能评估。众包能够高效地对任务中的部分技能进行可靠的成对比较。我们的框架可部署在手术培训项目中，用于对技术技能进行客观、自动和标准化的评估。

相似文献

A study of crowdsourced segment-level surgical skill assessment using pairwise rankings.

Int J Comput Assist Radiol Surg. 2015 Sep;10(9):1435-47. doi: 10.1007/s11548-015-1238-6. Epub 2015 Jun 30.

Crowd-Sourced Assessment of Technical Skill: A Valid Method for Discriminating Basic Robotic Surgery Skills.

J Endourol. 2015 Nov;29(11):1295-301. doi: 10.1089/end.2015.0191. Epub 2015 Aug 24.

Crowdsourced Assessment of Surgical Skill Proficiency in Cataract Surgery.

J Surg Educ. 2021 Jul-Aug;78(4):1077-1088. doi: 10.1016/j.jsurg.2021.02.004. Epub 2021 Feb 25.

Crowd-sourced assessment of technical skills: an adjunct to urology resident surgical simulation training.

J Endourol. 2015 May;29(5):604-9. doi: 10.1089/end.2014.0616. Epub 2015 Jan 7.

C-SATS: Assessing Surgical Skills Among Urology Residency Applicants.

J Endourol. 2017 Apr;31(S1):S95-S100. doi: 10.1089/end.2016.0569. Epub 2016 Oct 11.

Crowdsourcing: a valid alternative to expert evaluation of robotic surgery skills.

Am J Obstet Gynecol. 2016 Nov;215(5):644.e1-644.e7. doi: 10.1016/j.ajog.2016.06.033. Epub 2016 Jun 27.

Measuring to Improve: Peer and Crowd-sourced Assessments of Technical Skill with Robot-assisted Radical Prostatectomy.

Eur Urol. 2016 Apr;69(4):547-550. doi: 10.1016/j.eururo.2015.11.028. Epub 2016 Jan 2.

Crowdsourced versus expert evaluations of the vesico-urethral anastomosis in the robotic radical prostatectomy: is one superior at discriminating differences in automated performance metrics?

J Robot Surg. 2018 Dec;12(4):705-711. doi: 10.1007/s11701-018-0814-5. Epub 2018 Apr 30.

Crowd-Sourced Assessment of Technical Skills: Differentiating Animate Surgical Skill Through the Wisdom of Crowds.

J Endourol. 2015 Oct;29(10):1183-8. doi: 10.1089/end.2015.0104. Epub 2015 May 26.

Crowdsourced Assessment of Ureteroscopy with Laser Lithotripsy Video Feed Does Not Correlate with Trainee Experience.

J Endourol. 2019 Jan;33(1):42-49. doi: 10.1089/end.2018.0534. Epub 2018 Dec 22.

引用本文的文献

Untangling surgical gesture analysis-are we even speaking the same language? a systematic review.

Surg Endosc. 2025 Sep;39(9):5538-5557. doi: 10.1007/s00464-025-11907-x. Epub 2025 Jul 31.

Evaluation of objective tools and artificial intelligence in robotic surgery technical skills assessment: a systematic review.

Br J Surg. 2024 Jan 3;111(1). doi: 10.1093/bjs/znad331.

Toward Correcting Anxious Movements Using Haptic Cues on the Da Vinci Surgical Robot.

Proc IEEE RAS EMBS Int Conf Biomed Robot Biomechatron. 2022 Aug;2022. doi: 10.1109/biorob52689.2022.9925380. Epub 2022 Nov 3.

Video-based assessment of intraoperative surgical skill.

Int J Comput Assist Radiol Surg. 2022 Oct;17(10):1801-1811. doi: 10.1007/s11548-022-02681-5. Epub 2022 May 30.

Surgical data science - from concepts toward clinical translation.

Med Image Anal. 2022 Feb;76:102306. doi: 10.1016/j.media.2021.102306. Epub 2021 Nov 18.

Explaining a model predicting quality of surgical practice: a first presentation to and review by clinical experts.

Int J Comput Assist Radiol Surg. 2021 Nov;16(11):2009-2019. doi: 10.1007/s11548-021-02422-0. Epub 2021 Jun 18.

Crowdsourcing in health and medical research: a systematic review.

Infect Dis Poverty. 2020 Jan 20;9(1):8. doi: 10.1186/s40249-020-0622-9.

Automatic and near real-time stylistic behavior assessment in robotic surgery.

Int J Comput Assist Radiol Surg. 2019 Apr;14(4):635-643. doi: 10.1007/s11548-019-01920-6. Epub 2019 Feb 18.

A computer vision technique for automated assessment of surgical performance using surgeons' console-feed videos.

Int J Comput Assist Radiol Surg. 2019 Apr;14(4):697-707. doi: 10.1007/s11548-018-1881-9. Epub 2018 Nov 20.

Mapping of Crowdsourcing in Health: Systematic Review.

J Med Internet Res. 2018 May 15;20(5):e187. doi: 10.2196/jmir.9330.

本文引用的文献

Automated objective surgical skill assessment in the operating room from unstructured tool motion in septoplasty.

Int J Comput Assist Radiol Surg. 2015 Jun;10(6):981-91. doi: 10.1007/s11548-015-1194-1. Epub 2015 Apr 17.

Can masses of non-experts train highly accurate image classifiers? A crowdsourcing approach to instrument segmentation in laparoscopic images.

Med Image Comput Comput Assist Interv. 2014;17(Pt 2):438-45. doi: 10.1007/978-3-319-10470-6_55.

Crowdsourcing for reference correspondence generation in endoscopic images.

Med Image Comput Comput Assist Interv. 2014;17(Pt 2):349-56. doi: 10.1007/978-3-319-10470-6_44.

Crowd-Sourced Assessment of Technical Skills: a novel method to evaluate surgical performance.

J Surg Res. 2014 Mar;187(1):65-71. doi: 10.1016/j.jss.2013.09.024. Epub 2013 Oct 10.

String motif-based description of tool motion for detecting skill and gestures in robotic surgery.

Med Image Comput Comput Assist Interv. 2013;16(Pt 1):26-33. doi: 10.1007/978-3-642-40811-3_4.

Randomized controlled trial on the effect of coaching in simulated laparoscopic training.

Surg Endosc. 2014 Mar;28(3):979-86. doi: 10.1007/s00464-013-3265-0. Epub 2013 Nov 7.

Surgical skill and complication rates after bariatric surgery.

N Engl J Med. 2013 Oct 10;369(15):1434-42. doi: 10.1056/NEJMsa1300625.

Surgical gesture classification from video and kinematic data.

Med Image Anal. 2013 Oct;17(7):732-45. doi: 10.1016/j.media.2013.04.007. Epub 2013 Apr 28.

Objective measures for longitudinal assessment of robotic surgery training.

J Thorac Cardiovasc Surg. 2012 Mar;143(3):528-34. doi: 10.1016/j.jtcvs.2011.11.002. Epub 2011 Dec 14.

Assessing system operation skills in robotic surgery trainees.

Int J Med Robot. 2012 Mar;8(1):118-24. doi: 10.1002/rcs.449. Epub 2011 Nov 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一项使用成对排名进行众包的节段级手术技能评估研究。

A study of crowdsourced segment-level surgical skill assessment using pairwise rankings.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献