Kelly Jason D, Comstock Bryan, Kowalewski Timothy M, Smartt James M
Department of Mechanical Engineering, University of Minnesota, Minneapolis, Minn.
Department of General Internal Medicine, University of Washington, Seattle, Wa.
Plast Reconstr Surg Glob Open. 2021 Jan 25;9(1):e3315. doi: 10.1097/GOX.0000000000003315. eCollection 2021 Jan.
Reliable and valid assessments of the visual endpoints of aesthetic surgery procedures are needed. Currently, most assessments are based on the opinion of patients and their plastic surgeons. The objective of this research was to analyze the reliability of crowdworkers assessing de-identified photographs using a validated scale that depicts lower facial aging.
Twenty photographs of the facial nasolabial region of various non-identifiable faces were obtained for which various degrees of facial aging were present. Independent crowds of 100 crowd workers were tasked with assessing the degree of aging using a photograph numeric scale. Independent groups of crowdworkers were surveyed at 4 different times (weekday daytime, weekday nighttime, weekend daytime, weekend nighttime), once a week for 2 weeks.
Crowds assessing midface region photographs had an overall correlation of R = 0.979 (weekday daytime R = 0.991; weekday nighttime R = 0.985; weekend daytime R = 0.997; weekend nighttime R = 0.985). Bland-Altman test for test-retest agreement showed a normal distribution of assessments over the various times tested, with the differences in the majority of photographs being within 1 SD of the average difference in ratings.
Crowd assessments of facial aging in de-identified photographs displayed very strong concordance with each other, regardless of time of day or week. This shows promise toward obtaining reliable assessments of pre and postoperative results for aesthetic surgery procedures. More work must be done to quantify the reliability of assessments for other pretreatment states or the corresponding results following treatment.
需要对美容手术的视觉终点进行可靠且有效的评估。目前,大多数评估基于患者及其整形外科医生的意见。本研究的目的是分析众包工人使用经过验证的描述面部下部衰老的量表评估匿名照片的可靠性。
获取了20张不同匿名面部鼻唇区域的照片,这些照片呈现出不同程度的面部衰老。100名众包工人组成的独立群体负责使用照片数字量表评估衰老程度。众包工人独立小组在4个不同时间(工作日白天、工作日夜间、周末白天、周末夜间)接受调查,每周一次,共两周。
评估中面部区域照片的群体总体相关性为R = 0.979(工作日白天R = 0.991;工作日夜间R = 0.985;周末白天R = 0.997;周末夜间R = 0.985)。重测一致性的Bland-Altman检验显示,在各个测试时间的评估呈正态分布,大多数照片的差异在评分平均差异的1个标准差范围内。
对匿名照片中面部衰老的众包评估彼此之间显示出非常强的一致性,无论一天中的时间或一周中的日期如何。这表明在获得美容手术术前和术后结果的可靠评估方面具有前景。必须开展更多工作来量化对其他治疗前状态或治疗后相应结果的评估可靠性。