Sue & Bill Gross School of Nursing, University of California Irvine, Irvine, California, USA
Stanford University, Stanford, California, USA.
BMJ Health Care Inform. 2023 Jan;30(1). doi: 10.1136/bmjhci-2022-100666.
Survival machine learning (ML) has been suggested as a useful approach for forecasting future events, but a growing concern exists that ML models have the potential to cause racial disparities through the data used to train them. This study aims to develop race/ethnicity-specific survival ML models for Hispanic and black women diagnosed with breast cancer to examine whether race/ethnicity-specific ML models outperform the general models trained with all races/ethnicity data.
We used the data from the US National Cancer Institute's Surveillance, Epidemiology and End Results programme registries. We developed the Hispanic-specific and black-specific models and compared them with the general model using the Cox proportional-hazards model, Gradient Boost Tree, survival tree and survival support vector machine.
A total of 322 348 female patients who had breast cancer diagnoses between 1 January 2000 and 31 December 2017 were identified. The race/ethnicity-specific models for Hispanic and black women consistently outperformed the general model when predicting the outcomes of specific race/ethnicity.
Accurately predicting the survival outcome of a patient is critical in determining treatment options and providing appropriate cancer care. The high-performing models developed in this study can contribute to providing individualised oncology care and improving the survival outcome of black and Hispanic women.
Predicting the individualised survival outcome of breast cancer can provide the evidence necessary for determining treatment options and high-quality, patient-centred cancer care delivery for under-represented populations. Also, the race/ethnicity-specific ML models can mitigate representation bias and contribute to addressing health disparities.
生存机器学习(ML)已被提议作为预测未来事件的有用方法,但人们越来越担心,ML 模型有可能通过用于训练它们的数据造成种族差异。本研究旨在为被诊断患有乳腺癌的西班牙裔和黑人女性开发特定种族/族裔的生存 ML 模型,以检验特定种族/族裔的 ML 模型是否优于使用所有种族/族裔数据训练的通用模型。
我们使用了美国国家癌症研究所的监测、流行病学和最终结果计划登记处的数据。我们开发了西班牙裔特定和黑人特定模型,并使用 Cox 比例风险模型、梯度提升树、生存树和生存支持向量机与通用模型进行了比较。
共确定了 322348 名在 2000 年 1 月 1 日至 2017 年 12 月 31 日期间被诊断患有乳腺癌的女性患者。在预测特定种族/族裔的结果时,西班牙裔和黑人女性的特定种族/族裔模型始终优于通用模型。
准确预测患者的生存结果对于确定治疗方案和提供适当的癌症护理至关重要。本研究中开发的高性能模型可以为提供个体化肿瘤学护理和改善黑人和西班牙裔女性的生存结果做出贡献。
预测乳腺癌的个体化生存结果可以为确定治疗方案和为代表性不足的人群提供高质量、以患者为中心的癌症护理提供必要的证据。此外,特定种族/族裔的 ML 模型可以减轻代表性偏差,并有助于解决健康差异问题。