Suppr超能文献

两种主观肤色量表的有效性及其对医疗保健模式公平性的影响。

Validity of two subjective skin tone scales and its implications on healthcare model fairness.

作者信息

Cu Cassandra W, Dundas Nicole E, Heintz Timothy, Sheikh Zahida A, Alonso-Bermudez Bianca, Walker Jasmine, Wooten Avery, Badathala Anusha, Chapman Allyson, Ehie Odinakachukwu, Raghunathan Karthik, Mills Hunter, Espejo Edie, Boscardin John, Wallace Arthur W, Cobert Julien

机构信息

School of Medicine, Tufts University School of Medicine, Boston, MA, USA.

UC Berkeley Department of Bioengineering, Berkeley, CA, USA.

出版信息

NPJ Digit Med. 2025 Oct 3;8(1):595. doi: 10.1038/s41746-025-01975-7.

Abstract

Skin tone assessments are critical for fairness evaluation in healthcare algorithms (e.g., pulse oximetry) but lack validation. Using prospectively collected facial images from 90 hospitalized adults at the San Francisco VA, three independent annotators rated facial regions in triplicate using Fitzpatrick (I-VI) and Monk (1-10) skin tone scales. Patients also self-identified their skin tone. Annotator confidence was recorded using 5-point Likert scales. Across 810 images in 90 patients (9 images each), within-rater agreement was high, but inter-annotator agreement was moderate to low. Annotators frequently rated patients as darker when patients self-identified as lighter, and lighter when patients self-identified as darker. In linear mixed-effects models controlling for facial region and annotator confidence, darker self-reported skin tones were associated with lighter annotator scores. These findings highlight challenges in consistent skin tone labeling and suggest that current methods for assessing representation in biosensor-based algorithm studies may be influenced by labeling bias.

摘要

肤色评估对于医疗保健算法(如脉搏血氧饱和度测定)中的公平性评估至关重要,但缺乏验证。利用从旧金山退伍军人事务部前瞻性收集的90名住院成年人的面部图像,三名独立注释者使用菲茨帕特里克(I-VI)和蒙克(1-10)肤色量表对面部区域进行了三次评分。患者也自行确定了自己的肤色。使用5点李克特量表记录注释者的信心。在90名患者的810张图像(每人9张)中,评分者内部一致性较高,但注释者之间的一致性为中度至低度。当患者自行确定肤色较浅时,注释者经常将其评为较深;而当患者自行确定肤色较深时,注释者则将其评为较浅。在控制面部区域和注释者信心的线性混合效应模型中,自我报告的较深肤色与注释者较低的评分相关。这些发现凸显了在一致的肤色标注方面的挑战,并表明当前基于生物传感器的算法研究中评估代表性的方法可能受到标注偏差的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b3c/12494915/869ddec697fb/41746_2025_1975_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验