Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States.
Department of Obstetrics and Gynecology, Weill Cornell Medicine, New York, NY 10065, United States.
J Am Med Inform Assoc. 2024 May 20;31(6):1258-1267. doi: 10.1093/jamia/ocae056.
We developed and externally validated a machine-learning model to predict postpartum depression (PPD) using data from electronic health records (EHRs). Effort is under way to implement the PPD prediction model within the EHR system for clinical decision support. We describe the pre-implementation evaluation process that considered model performance, fairness, and clinical appropriateness.
We used EHR data from an academic medical center (AMC) and a clinical research network database from 2014 to 2020 to evaluate the predictive performance and net benefit of the PPD risk model. We used area under the curve and sensitivity as predictive performance and conducted a decision curve analysis. In assessing model fairness, we employed metrics such as disparate impact, equal opportunity, and predictive parity with the White race being the privileged value. The model was also reviewed by multidisciplinary experts for clinical appropriateness. Lastly, we debiased the model by comparing 5 different debiasing approaches of fairness through blindness and reweighing.
We determined the classification threshold through a performance evaluation that prioritized sensitivity and decision curve analysis. The baseline PPD model exhibited some unfairness in the AMC data but had a fair performance in the clinical research network data. We revised the model by fairness through blindness, a debiasing approach that yielded the best overall performance and fairness, while considering clinical appropriateness suggested by the expert reviewers.
The findings emphasize the need for a thorough evaluation of intervention-specific models, considering predictive performance, fairness, and appropriateness before clinical implementation.
我们利用电子健康记录(EHR)的数据开发并外部验证了一种用于预测产后抑郁症(PPD)的机器学习模型。目前正在努力将 PPD 预测模型实施到 EHR 系统中,以提供临床决策支持。我们描述了实施前的评估过程,该过程考虑了模型性能、公平性和临床适宜性。
我们使用了来自学术医疗中心(AMC)的 EHR 数据和 2014 年至 2020 年的临床研究网络数据库,以评估 PPD 风险模型的预测性能和净收益。我们使用曲线下面积和敏感性作为预测性能,并进行了决策曲线分析。在评估模型公平性时,我们采用了不同的指标,如差异影响、均等机会和预测均等性,其中白种人被视为特权群体。该模型还由多学科专家进行了临床适宜性审查。最后,我们通过比较 5 种不同的公平性去偏方法(盲目和重新加权)来对模型进行去偏。
我们通过性能评估确定了分类阈值,该评估优先考虑了敏感性和决策曲线分析。基础 PPD 模型在 AMC 数据中存在一些不公平性,但在临床研究网络数据中表现出公平的性能。我们通过盲目公平性对模型进行了修正,这是一种去偏方法,在考虑了专家评审员建议的临床适宜性的同时,产生了最佳的整体性能和公平性。
研究结果强调了在临床实施之前,需要对特定干预措施的模型进行全面评估,考虑预测性能、公平性和适宜性。