Lin Yun C, Mallia Daniel, Clark-Sevilla Andrea O, Catto Adam, Leshchenko Alisa, Yan Qi, Haas David M, Wapner Ronald, Pe'er Itsik, Raja Anita, Salleb-Aouissi Ansaf
Department of Computer Science, Columbia University, 1214 Amsterdam Ave, 721 Schapiro CEPSR, New York, NY, 10027, USA.
Department of Computer Science, CUNY Hunter College, New York, NY, USA.
BMC Pregnancy Childbirth. 2024 Dec 24;24(1):853. doi: 10.1186/s12884-024-06988-w.
Preeclampsia is one of the leading causes of maternal morbidity, with consequences during and after pregnancy. Because of its diverse clinical presentation, preeclampsia is an adverse pregnancy outcome that is uniquely challenging to predict and manage. In this paper, we developed racial bias-free machine learning models that predict the onset of preeclampsia with severe features or eclampsia at discrete time points in a nulliparous pregnant study cohort. To focus on those most at risk, we selected probands with severe PE (sPE). Those with mild preeclampsia, superimposed preeclampsia, and new onset hypertension were excluded.The prospective study cohort to which we applied machine learning is the Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-be (nuMoM2b) study, which contains information from eight clinical sites across the US. Maternal serum samples were collected for 1,857 individuals between the first and second trimesters. These patients with serum samples collected are selected as the final cohort.Our prediction models achieved an AUROC of 0.72 (95% CI, 0.69-0.76), 0.75 (95% CI, 0.71-0.79), and 0.77 (95% CI, 0.74-0.80), respectively, for the three visits. Our initial models were biased toward non-Hispanic black participants with a high predictive equality ratio of 1.31. We corrected this bias and reduced this ratio to 1.14. This lowers the rate of false positives in our predictive model for the non-Hispanic black participants. The exact cause of the bias is still under investigation, but previous studies have recognized PLGF as a potential bias-inducing factor. However, since our model includes various factors that exhibit a positive correlation with PLGF, such as blood pressure measurements and BMI, we have employed an algorithmic approach to disentangle this bias from the model.The top features of our built model stress the importance of using several tests, particularly for biomarkers (BMI and blood pressure measurements) and ultrasound measurements. Placental analytes (PLGF and Endoglin) were strong predictors for screening for the early onset of preeclampsia with severe features in the first two trimesters.
子痫前期是孕产妇发病的主要原因之一,在孕期及产后都会产生后果。由于其临床表现多样,子痫前期是一种独特的不良妊娠结局,在预测和管理方面具有挑战性。在本文中,我们开发了无种族偏见的机器学习模型,用于预测初产妇妊娠研究队列中离散时间点的重度子痫前期或子痫发作。为了关注那些风险最高的人群,我们选择了重度子痫前期(sPE)的先证者。排除了轻度子痫前期、叠加子痫前期和新发高血压患者。我们应用机器学习的前瞻性研究队列是初产妇妊娠结局研究:监测准妈妈(nuMoM2b)研究,该研究包含来自美国八个临床地点的信息。在孕早期和孕中期之间收集了1857名个体的母血清样本。收集了血清样本的这些患者被选为最终队列。我们的预测模型在三次访视中分别实现了0.72(95%可信区间,0.69 - 0.76)、0.75(95%可信区间,0.71 - 0.79)和0.77(95%可信区间,0.74 - 0.80)的曲线下面积(AUROC)。我们最初的模型对非西班牙裔黑人参与者存在偏差,预测平等比高达1.31。我们纠正了这种偏差,将该比率降至1.14。这降低了我们预测模型中针对非西班牙裔黑人参与者的假阳性率。偏差的确切原因仍在调查中,但先前的研究已将胎盘生长因子(PLGF)视为潜在的偏差诱导因素。然而,由于我们的模型包含了与PLGF呈正相关的各种因素,如血压测量值和体重指数(BMI),我们采用了一种算法方法来从模型中消除这种偏差。我们构建模型的首要特征强调了使用多种检测方法的重要性,特别是针对生物标志物(BMI和血压测量值)以及超声测量。胎盘分析物(PLGF和内皮糖蛋白)是孕早期和孕中期筛查重度子痫前期早期发作的强预测指标。