Weir Vanessa R, Dempsey Katelyn, Gichoya Judy Wawira, Rotemberg Veronica, Wong An-Kwok Ian
Dermatology Service, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Duke University, Durham, NC, USA.
NPJ Digit Med. 2024 Jul 17;7(1):191. doi: 10.1038/s41746-024-01176-8.
Increasing evidence supports reduced accuracy of noninvasive assessment tools, such as pulse oximetry, temperature probes, and AI skin diagnosis benchmarks, in patients with darker skin tones. The FDA is exploring potential strategies for device regulation to improve performance across diverse skin tones by including skin tone criteria. However, there is no consensus about how prospective studies should perform skin tone assessment in order to take this bias into account. There are several tools available to conduct skin tone assessments including administered visual scales (e.g., Fitzpatrick Skin Type, Pantone, Monk Skin Tone) and color measurement tools (e.g., reflectance colorimeters, reflectance spectrophotometers, cameras), although none are consistently used or validated across multiple medical domains. Accurate and consistent skin tone measurement depends on many factors including standardized environments, lighting, body parts assessed, patient conditions, and choice of skin tone assessment tool(s). As race and ethnicity are inadequate proxies for skin tone, these considerations can be helpful in standardizing the effect of skin tone on studies such as AI dermatology diagnoses, pulse oximetry, and temporal thermometers. Skin tone bias in medical devices is likely due to systemic factors that lead to inadequate validation across diverse skin tones. There is an opportunity for researchers to use skin tone assessment methods with standardized considerations in prospective studies of noninvasive tools that may be affected by skin tone. We propose considerations that researchers must take in order to improve device robustness to skin tone bias.
越来越多的证据表明,对于肤色较深的患者,诸如脉搏血氧饱和度仪、体温探头和人工智能皮肤诊断基准等非侵入性评估工具的准确性会降低。美国食品药品监督管理局(FDA)正在探索设备监管的潜在策略,通过纳入肤色标准来提高不同肤色人群中的设备性能。然而,对于前瞻性研究应如何进行肤色评估以考虑到这种偏差,目前尚无共识。有几种工具可用于进行肤色评估,包括使用视觉量表(如菲茨帕特里克皮肤分型、潘通色卡、蒙克肤色量表)和颜色测量工具(如反射色度计、反射分光光度计、相机),不过在多个医学领域中,没有一种工具得到持续使用或验证。准确且一致的肤色测量取决于许多因素,包括标准化的环境、光照、评估的身体部位、患者状况以及肤色评估工具的选择。由于种族和民族不足以代表肤色,这些考虑因素有助于规范肤色对人工智能皮肤病诊断、脉搏血氧饱和度测定和体温计等研究的影响。医疗设备中的肤色偏差可能是由于系统性因素导致在不同肤色人群中验证不足。研究人员有机会在可能受肤色影响的非侵入性工具的前瞻性研究中,使用具有标准化考量的肤色评估方法。我们提出了研究人员为提高设备对肤色偏差的稳健性必须考虑的因素。