Division of Systems Engineering, Center for Information and Systems Engineering (CISE), Boston University, Brookline, MA, United States.
Division of Reproductive Endocrinology and Infertility, Department of Obstetrics and Gynecology, Massachusetts General Hospital, Boston, MA, United States.
Front Endocrinol (Lausanne). 2024 Jan 30;15:1298628. doi: 10.3389/fendo.2024.1298628. eCollection 2024.
Predictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis.
This is a retrospective cohort study from a SafetyNet hospital's electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound.
We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved an average AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG.
Machine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary.
预测模型已被用于辅助多囊卵巢综合征(PCOS)的早期诊断,尽管现有的模型基于小样本量且仅限于生育诊所人群。我们使用机器学习算法构建了一个预测模型,该模型基于存在 PCOS 风险的门诊人群,以预测风险并促进早期诊断,特别是在那些符合诊断标准但尚未得到诊断的人群中。
这是一项来自 SafetyNet 医院电子健康记录(EHR)的回顾性队列研究,时间为 2003 年至 2016 年。研究人群包括 30601 名年龄在 18-45 岁之间、无内分泌疾病的女性,她们曾因初级保健、妇产科、内分泌科、家庭医学或普通内科在波士顿医疗中心就诊。评估了四个与 PCOS 相关的预测结果。第一个结果是 PCOS 的 ICD-9 诊断,此外还有算法定义的 PCOS 模型结果。后者基于 Rotterdam 标准,通过合并 EHR 中的实验室值、影像学检查和 ICD 数据来定义不规则月经、高雄激素血症和超声多囊卵巢形态。
我们使用四种机器学习方法开发了预测模型:逻辑回归、支持向量机、梯度提升树和随机森林。激素值(卵泡刺激素、黄体生成素、雌二醇和性激素结合球蛋白)被组合在一起,使用神经网络分类器创建一个多层感知器分数。在患者的样本外测试集中,对 PCOS 的预测在模型 I、II、III 和 IV 中分别达到了 85%、81%、80%和 82%的平均 AUC。在所有模型中,PCOS 诊断的显著正预测因子包括激素水平和肥胖;负预测因子包括生育次数和阳性 bHCG。
机器学习算法被用于基于高危人群预测 PCOS。这种方法可以在 EHR 接口人群中指导 PCOS 的早期发现,以促进咨询和干预,从而可能减少长期健康后果。我们的模型说明了人工智能辅助工具的潜在益处,该工具可以集成到 EHR 中,以减少诊断延迟。然而,在其他基于医院的人群中进行模型验证是必要的。