Feng Catherine H, Deng Fei, Disis Mary L, Gao Nan, Zhang Lanjing
Department of Molecular and Cellular Biology, Harvard University, 52 Oxford St, Cambridge, MA, 02138 United States.
Department of Statistics, Harvard University, 1 Oxford St, Cambridge, MA 02138, United States.
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf398.
Classification of patient multicategory survival outcomes is important for personalized cancer treatments. Machine learning (ML) algorithms have increasingly been used to inform healthcare decisions, but these models are vulnerable to biases in data collection and algorithm creation. ML models have previously been shown to exhibit racial bias, but their fairness towards patients from different age and sex groups have yet to be studied. Therefore, we compared the multimetric performances of five ML models (random forests, multinomial logistic regression, linear support vector classifier, linear discriminant analysis, and multilayer perceptron) when classifying colorectal cancer patients (n = 589) of various age, sex, and racial groups using The Cancer Genome Atlas data. All five models exhibited biases for these sociodemographic groups. We then repeated the same process on lung adenocarcinoma (n = 515) to validate our findings. Surprisingly, most models tended to perform more poorly overall for the largest sociodemographic groups. Methods to optimize model performance, including testing the model on merged age, sex, or racial groups, and creating a model trained on and used for an individual or merged sociodemographic group, show potential to reduce disparities in model performance for different groups. This is supported by our regression analysis showing associations between model choice and methodology used with reduced performance disparities across demographic subgroups. Notably, these methods may be used to improve ML fairness while avoiding penalizing the model for exhibiting bias and thus sacrificing overall performance.
对患者多类别生存结果进行分类对于个性化癌症治疗至关重要。机器学习(ML)算法越来越多地被用于为医疗决策提供信息,但这些模型容易受到数据收集和算法创建过程中的偏差影响。此前已证明ML模型存在种族偏见,但其对不同年龄和性别的患者的公平性尚未得到研究。因此,我们使用癌症基因组图谱数据,比较了五种ML模型(随机森林、多项逻辑回归、线性支持向量分类器、线性判别分析和多层感知器)在对不同年龄、性别和种族组的结直肠癌患者(n = 589)进行分类时的多指标性能。所有五个模型在这些社会人口统计学组中均表现出偏差。然后,我们对肺腺癌患者(n = 515)重复相同的过程以验证我们的发现。令人惊讶的是,对于最大的社会人口统计学组,大多数模型总体上表现更差。优化模型性能的方法,包括在合并的年龄、性别或种族组上测试模型,以及创建在单个或合并的社会人口统计学组上训练并用于该组的模型,显示出减少不同组模型性能差异的潜力。我们的回归分析支持了这一点,该分析显示模型选择和使用的方法与跨人口亚组减少的性能差异之间存在关联。值得注意的是,这些方法可用于提高ML的公平性,同时避免因模型表现出偏差而受到惩罚,从而牺牲整体性能。