Division of Pharmaceutical Outcomes and Policy, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
Program for Health and Clinical Informatics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
J Am Med Inform Assoc. 2019 Oct 1;26(10):977-988. doi: 10.1093/jamia/ocz036.
We aimed to investigate bias in applying machine learning to predict real-world individual treatment effects.
Using a virtual patient cohort, we simulated real-world healthcare data and applied random forest and gradient boosting classifiers to develop prediction models. Treatment effect was estimated as the difference between the predicted outcomes of a treatment and a control. We evaluated the impact of predictors (ie, treatment predictors [X1], confounders [X2], treatment effects modifiers [X3], and other outcome risk factors [X4]) with known effects on treatment and outcome using real-world data, and outcome imbalance on predicting individual outcome. Using counterfactuals, we evaluated percentage of patients with biased predicted individual treatment effects.
The X4 had relatively more impact on model performance than X2 and X3 did. No effects were observed from X1. Moderate-to-severe outcome imbalance had a significantly negative impact on model performance, particularly among subgroups in which an outcome occurred. Bias in predicting individual treatment effects was significant and persisted even when the models had a 100% accuracy in predicting health outcome.
Inadequate inclusion of the X2, X3, and X4 and moderate-to-severe outcome imbalance may affect model performance in predicting individual outcome and subsequently bias in predicting individual treatment effects. Machine learning models with all features and high performance for predicting individual outcome still yielded biased individual treatment effects.
Direct application of machine learning might not adequately address bias in predicting individual treatment effects. Further method development is needed to advance machine learning to support individualized treatment selection.
我们旨在研究应用机器学习预测真实世界个体治疗效果时的偏倚。
使用虚拟患者队列,我们模拟了真实的医疗保健数据,并应用随机森林和梯度提升分类器来开发预测模型。治疗效果估计为治疗和对照的预测结果之间的差异。我们使用真实数据评估了已知对治疗和结果有影响的预测因子(即治疗预测因子[X1]、混杂因素[X2]、治疗效果修饰因子[X3]和其他结果风险因素[X4])以及对预测个体结果的结果不平衡的影响。使用反事实,我们评估了具有偏置预测个体治疗效果的患者的百分比。
X4 对模型性能的影响相对大于 X2 和 X3。X1 没有效果。中度至重度结果不平衡对模型性能有显著负面影响,尤其是在发生结果的亚组中。预测个体治疗效果的偏差是显著的,即使模型在预测健康结果方面具有 100%的准确性,这种偏差仍然存在。
X2、X3 和 X4 的纳入不足以及中度至重度结果不平衡可能会影响预测个体结果的模型性能,进而影响预测个体治疗效果的偏差。具有所有特征和高预测个体结果性能的机器学习模型仍然会产生有偏差的个体治疗效果。
直接应用机器学习可能无法充分解决预测个体治疗效果时的偏差。需要进一步开发方法,以推进机器学习支持个体化治疗选择。