Wong Emily F, Saini Anil K, Accortt Eynav E, Wong Melissa S, Moore Jason H, Bright Tiffani J
Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California.
Department of Obstetrics and Gynecology, Cedars-Sinai Medical Center, Los Angeles, California.
JAMA Netw Open. 2024 Dec 2;7(12):e2438152. doi: 10.1001/jamanetworkopen.2024.38152.
Machine learning for augmented screening of perinatal mood and anxiety disorders (PMADs) requires thorough consideration of clinical biases embedded in electronic health records (EHRs) and rigorous evaluations of model performance.
To mitigate bias in predictive models of PMADs trained on commonly available EHRs.
DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study collected data as part of a quality improvement initiative from 2020 to 2023 at Cedars-Sinai Medical Center in Los Angeles, California. The study inclusion criteria were birthing patients aged 14 to 59 years with live birth records and admission to the postpartum unit or the maternal-fetal care unit after delivery.
Patient-reported race and ethnicity (7 levels) obtained through EHRs.
Logistic regression, random forest, and extreme gradient boosting models were trained to predict 2 binary outcomes: moderate to high-risk (positive) screen assessed using the 9-item Patient Health Questionnaire (PHQ-9), and the Edinburgh Postnatal Depression Scale (EPDS). Each model was fitted with or without reweighing data during preprocessing and evaluated through repeated K-fold cross validation. In every iteration, each model was evaluated on its area under the receiver operating curve (AUROC) and on 2 fairness metrics: demographic parity (DP), and difference in false negatives between races and ethnicities (relative to non-Hispanic White patients).
Among 19 430 patients in this study, 1402 (7%) identified as African American or Black, 2371 (12%) as Asian American and Pacific Islander; 1842 (10%) as Hispanic White, 10 942 (56.3%) as non-Hispanic White, 606 (3%) as multiple races, 2146 (11%) as other (not further specified), and 121 (<1%) did not provide this information. The mean (SD) age was 34.1 (4.9) years, and all patients identified as female. Racial and ethnic minority patients were significantly more likely than non-Hispanic White patients to screen positive on both the PHQ-9 (odds ratio, 1.47 [95% CI, 1.23-1.77]) and the EPDS (odds ratio, 1.38 [95% CI, 1.20-1.57]). Mean AUROCs ranged from 0.610 to 0.635 without reweighing (baseline), and from 0.602 to 0.622 with reweighing. Baseline models predicted significantly greater prevalence of postpartum depression for patients who were not non-Hispanic White relative to those who were (mean DP, 0.238 [95% CI, 0.231-0.244]; P < .001) and displayed significantly lower false-negative rates (mean difference, -0.184 [95% CI, -0.195 to -0.174]; P < .001). Reweighing significantly reduced differences in DP (mean DP with reweighing, 0.022 [95% CI, 0.017-0.026]; P < .001) and false-negative rates (mean difference with reweighing, 0.018 [95% CI, 0.008-0.028]; P < .001) between racial and ethnic groups.
In this diagnostic study of predictive models of postpartum depression, clinical prediction models trained to predict psychometric screening results from commonly available EHRs achieved modest performance and were less likely to widen existing health disparities in PMAD diagnosis and potentially treatment. These findings suggest that is critical for researchers and physicians to consider their model design (eg, desired target and predictor variables) and evaluate model bias to minimize health disparities.
用于围产期情绪和焦虑障碍(PMADs)增强筛查的机器学习需要全面考虑电子健康记录(EHRs)中存在的临床偏差,并对模型性能进行严格评估。
减轻在常用EHRs上训练的PMADs预测模型中的偏差。
设计、设置和参与者:这项诊断性研究收集的数据是2020年至2023年加利福尼亚州洛杉矶雪松西奈医疗中心质量改进计划的一部分。研究纳入标准为年龄在14至59岁之间、有活产记录且分娩后入住产后病房或母婴护理病房的产妇。
通过EHRs获得的患者自我报告的种族和族裔(7个类别)。
训练逻辑回归、随机森林和极端梯度提升模型以预测两个二元结局:使用9项患者健康问卷(PHQ-9)评估的中度至高度风险(阳性)筛查,以及爱丁堡产后抑郁量表(EPDS)。每个模型在预处理期间进行或不进行数据重新加权拟合,并通过重复K折交叉验证进行评估。在每次迭代中,每个模型根据其受试者工作特征曲线下面积(AUROC)以及两个公平性指标进行评估:人口统计学均等(DP),以及种族和族裔之间假阴性的差异(相对于非西班牙裔白人患者)。
本研究中的19430名患者中,1402名(7%)被认定为非裔美国人或黑人,2371名(12%)为亚裔美国人和太平洋岛民;1842名(10%)为西班牙裔白人,10942名(56.3%)为非西班牙裔白人,606名(3%)为多种族,2146名(11%)为其他(未进一步说明),121名(<1%)未提供此信息。平均(标准差)年龄为34.1(4.9)岁,所有患者均为女性。与非西班牙裔白人患者相比,种族和族裔少数群体患者在PHQ-9(优势比,1.47 [95% CI,1.23 - 1.77])和EPDS(优势比,1.38 [95% CI,1.20 - 1.57])上筛查呈阳性的可能性显著更高。未进行重新加权(基线)时,平均AUROC范围为0.610至0.635,进行重新加权时为0.602至0.622。基线模型预测,相对于非西班牙裔白人患者,非非西班牙裔白人患者产后抑郁的患病率显著更高(平均DP,0.238 [95% CI,0.231 - 0.244];P <.001),且假阴性率显著更低(平均差异,-0.184 [95% CI,-0.195至-0.174];P <.001)。重新加权显著降低了种族和族裔群体之间在DP(重新加权后的平均DP,0.022 [95% CI,0.017 - 0.026];P <.001)和假阴性率(重新加权后的平均差异,0.018 [95% CI,0.008 - 0.028];P <.001)方面的差异。
在这项产后抑郁预测模型的诊断性研究中,训练用于从常用EHRs预测心理测量筛查结果的临床预测模型表现一般,且不太可能扩大PMAD诊断及潜在治疗中现有的健康差距。这些发现表明,研究人员和医生考虑模型设计(如期望的目标和预测变量)并评估模型偏差以最小化健康差距至关重要。