Department of Obstetrics and Gynecology, Eastern Virginia Medical School, Norfolk, VA.
Departments of Preventive Medicine and Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL.
Am J Obstet Gynecol. 2024 Dec;231(6):649.e1-649.e19. doi: 10.1016/j.ajog.2024.03.031. Epub 2024 Mar 26.
The prevalence of metabolic syndrome is rapidly increasing in the United States. We hypothesized that prediction models using data obtained during pregnancy can accurately predict the future development of metabolic syndrome.
This study aimed to develop machine learning models to predict the development of metabolic syndrome using factors ascertained in nulliparous pregnant individuals.
This was a secondary analysis of a prospective cohort study (Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be Heart Health Study [nuMoM2b-HHS]). Data were collected from October 2010 to October 2020, and analyzed from July 2023 to October 2023. Participants had in-person visits 2 to 7 years after their first delivery. The primary outcome was metabolic syndrome, defined by the National Cholesterol Education Program Adult Treatment Panel III criteria, which was measured within 2 to 7 years after delivery. A total of 127 variables that were obtained during pregnancy were evaluated. The data set was randomly split into a training set (70%) and a test set (30%). We developed a random forest model and a lasso regression model using variables obtained during pregnancy. We compared the area under the receiver operating characteristic curve for both models. Using the model with the better area under the receiver operating characteristic curve, we developed models that included fewer variables based on SHAP (SHapley Additive exPlanations) values and compared them with the original model. The final model chosen would have fewer variables and noninferior areas under the receiver operating characteristic curve.
A total of 4225 individuals met the inclusion criteria; the mean (standard deviation) age was 27.0 (5.6) years. Of these, 754 (17.8%) developed metabolic syndrome. The area under the receiver operating characteristic curve of the random forest model was 0.878 (95% confidence interval, 0.846-0.909), which was higher than the 0.850 of the lasso model (95% confidence interval, 0.811-0.888; P<.001). Therefore, random forest models using fewer variables were developed. The random forest model with the top 3 variables (high-density lipoprotein, insulin, and high-sensitivity C-reactive protein) was chosen as the final model because it had the area under the receiver operating characteristic curve of 0.867 (95% confidence interval, 0.839-0.895), which was not inferior to the original model (P=.08). The area under the receiver operating characteristic curve of the final model in the test set was 0.847 (95% confidence interval, 0.821-0.873). An online application of the final model was developed (https://kawakita.shinyapps.io/metabolic/).
We developed a model that can accurately predict the development of metabolic syndrome in 2 to 7 years after delivery.
代谢综合征在美国的患病率正在迅速上升。我们假设使用怀孕期间获得的数据可以准确预测代谢综合征的未来发展。
本研究旨在开发使用未生育孕妇中确定的因素预测代谢综合征发展的机器学习模型。
这是一项前瞻性队列研究(未生育孕妇结局研究:监测孕妇心脏健康研究[nuMoM2b-HHS])的二次分析。数据收集于 2010 年 10 月至 2020 年 10 月,分析于 2023 年 7 月至 2023 年 10 月进行。参与者在首次分娩后 2 至 7 年内进行了面对面访问。主要结局是代谢综合征,根据国家胆固醇教育计划成人治疗小组 III 标准定义,在分娩后 2 至 7 年内进行测量。评估了 127 个在怀孕期间获得的变量。数据集被随机分为训练集(70%)和测试集(30%)。我们使用怀孕期间获得的变量开发了随机森林模型和套索回归模型。我们比较了两个模型的接收者操作特征曲线下的面积。使用接收者操作特征曲线下面积较好的模型,我们基于 SHAP(Shapley 可加解释)值开发了包含较少变量的模型,并将其与原始模型进行了比较。最终选择的模型将具有较少的变量和非劣效的接收者操作特征曲线下面积。
共有 4225 人符合纳入标准;平均(标准差)年龄为 27.0(5.6)岁。其中,754 人(17.8%)发生代谢综合征。随机森林模型的接收者操作特征曲线下面积为 0.878(95%置信区间,0.846-0.909),高于套索模型的 0.850(95%置信区间,0.811-0.888;P<.001)。因此,开发了使用较少变量的随机森林模型。选择具有前 3 个变量(高密度脂蛋白、胰岛素和高敏 C 反应蛋白)的随机森林模型作为最终模型,因为它的接收者操作特征曲线下面积为 0.867(95%置信区间,0.839-0.895),与原始模型相当(P=.08)。最终模型在测试集中的接收者操作特征曲线下面积为 0.847(95%置信区间,0.821-0.873)。开发了最终模型的在线应用程序(https://kawakita.shinyapps.io/metabolic/)。
我们开发了一种能够准确预测产后 2 至 7 年内代谢综合征发展的模型。