Fiske Amelia, Blacker Sarah, Geneviève Lester Darryl, Willem Theresa, Fritzsche Marie-Christine, Buyx Alena, Celi Leo Anthony, McLennan Stuart
Institute of History and Ethics in Medicine, Department of Preclinical Medicine, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany.
Department of Social Science, York University, Toronto, ON, Canada.
Lancet Digit Health. 2025 Apr;7(4):e286-e294. doi: 10.1016/j.landig.2025.01.003.
Many countries around the world do not collect race and ethnicity data in clinical settings. Without such identified data, it is difficult to identify biases in the training data or output of a given artificial intelligence (AI) algorithm, and to work towards medical AI tools that do not exclude or further harm marginalised groups. However, the collection of these data also poses specific risks to racially minoritised populations and other marginalised groups. This Viewpoint weighs the risks of collecting race and ethnicity data in clinical settings against the risks of not collecting those data. The collection of more comprehensive identified data (ie, data that include personal attributes such as race, ethnicity, and sex) has the possibility to benefit racially minoritised populations that have historically faced worse health outcomes and health-care access, and inadequate representation in research. However, the collection of extensive demographic data raises important concerns that include the construction of intersectional social categories (ie, race and its shifting meaning in different sociopolitical contexts), the risks of biological reductionism, and the potential for misuse, particularly in situations of historical exclusion, violence, conflict, genocide, and colonialism. Careful navigation of identified data collection is key to building better AI algorithms and to work towards medicine that does not exclude or harm marginalised groups.
世界上许多国家在临床环境中不收集种族和族裔数据。没有这些已识别的数据,就很难识别给定人工智能(AI)算法的训练数据或输出中的偏差,也难以开发出不排斥或进一步伤害边缘化群体的医学人工智能工具。然而,收集这些数据也给少数族裔人群和其他边缘化群体带来了特定风险。本观点权衡了在临床环境中收集种族和族裔数据的风险与不收集这些数据的风险。收集更全面的已识别数据(即包括种族、族裔和性别等个人属性的数据)有可能使历史上健康状况较差、获得医疗保健机会不足且在研究中代表性不足的少数族裔人群受益。然而,收集广泛的人口统计数据引发了重要问题,包括交叉社会类别的构建(即种族及其在不同社会政治背景下不断变化的含义)、生物还原论的风险以及被滥用的可能性,特别是在历史排斥、暴力、冲突、种族灭绝和殖民主义的情况下。谨慎处理已识别数据的收集是构建更好的人工智能算法以及致力于不排斥或伤害边缘化群体的医学的关键。