Suppr超能文献

crowdsourcing 形态评估在眼整形手术中:非专业人士与专业图像分析员和专家的可靠性和有效性。

Crowdsourcing Morphology Assessments in Oculoplastic Surgery: Reliability and Validity of Lay People Relative to Professional Image Analysts and Experts.

机构信息

Division of Orbital and Ophthalmic Plastic Surgery, Stein Eye Institute, University of California, Los Angeles.

Doheny Eye Institute, University of California, Los Angeles, Los Angeles.

出版信息

Ophthalmic Plast Reconstr Surg. 2020 Mar/Apr;36(2):178-181. doi: 10.1097/IOP.0000000000001515.

Abstract

PURPOSE

To determine if crowdsourced ratings of oculoplastic surgical outcomes provide reliable information compared to professional graders and oculoplastic experts.

METHODS

In this prospective psychometric evaluation, a scale for the rating of postoperative eyelid swelling was constructed using randomly selected images and topic experts. This scale was presented adjacent to 205 test images, including 10% duplicates. Graders were instructed to match the test image to the reference image it most closely resembles. Three sets of graders were solicited: crowdsourced lay people from Amazon Mechanical Turk marketplace, professional graders from the Doheny Image Reading Center (DIRC), and American Society of Ophthalmic Plastic and Reconstructive Surgery surgeons. Performance was assessed by classical correlational analysis and generalizability theory.

RESULTS

The correlation between scores on the first rating and the second rating for the 19 repeated occurrences was 0.60 for lay observers, 0.80 for DIRC graders and 0.84 for oculoplastic experts. In terms of inter-group rating reliability for all photos, the scores provided by lay observers were correlated with DIRC graders at a level of r = 0.88 and to experts at r = 0.79. The pictures themselves accounted for the greatest amount of variation among all groups. The amount of variation in the scores due to the rater was highest in the lay group at 25%, and was 20% and 21% for DIRC graders and experts, respectively.

CONCLUSIONS

Crowdsourced observers are insufficiently precise to replicate the results of experts in grading postoperative eyelid swelling. DIRC graders performed similarly to experts and present a less resource-intensive option.

摘要

目的

确定众包评估眼整形手术结果的评分是否比专业分级员和眼整形专家提供的信息更可靠。

方法

在这项前瞻性心理测量评估中,使用随机选择的图像和主题专家构建了用于评估术后眼睑肿胀的评分量表。该量表与 205 个测试图像一起呈现,其中包括 10%的重复图像。分级员被指示将测试图像与最相似的参考图像进行匹配。征集了三组分级员:来自亚马逊 Mechanical Turk 市场的众包非专业人士、来自 Doheny Image Reading Center(DIRC)的专业分级员和美国眼整形重建外科学会的外科医生。通过经典相关分析和概化理论评估绩效。

结果

19 个重复出现的第一次评分和第二次评分之间的相关性为:非专业人士为 0.60,DIRC 分级员为 0.80,眼整形专家为 0.84。就所有照片的组间评分可靠性而言,非专业人士的评分与 DIRC 分级员的相关性为 r = 0.88,与专家的相关性为 r = 0.79。所有组中,图片本身的变异性最大。评分的变异性在非专业人士组中最高,为 25%,在 DIRC 分级员和专家中分别为 20%和 21%。

结论

众包观察者不够精确,无法复制专家在评估术后眼睑肿胀方面的结果。DIRC 分级员的表现与专家相似,且资源密集度较低。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验