Wang Chen-Yu, Pei Dee, Wang Chun-Kai, Ke Jyun-Cheng, Lee Siou-Ting, Chu Ta-Wei, Liang Yao-Jen
Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan; Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City, Taiwan.
Department of Medicine, Medical School, Fu Jen Catholic University, Department of Endocrinology and Metabolism, Fu Jen Catholic University Hospital, New Taipei City, Taiwan.
Taiwan J Obstet Gynecol. 2025 Jan;64(1):68-75. doi: 10.1016/j.tjog.2024.09.019.
With an estimated global frequency ranging from5 % to 21 %, polycystic ovary syndrome (PCOS) is one of the most prevalent hormonal disorders. There are many factors found to be related to PCOS. However, most of these researches used traditional methods such as multiple logistic regression (LR). Nowadays, machine learning (Mach-L) emerges as a new method and can be used in medical researches. In the present study, there were two goals: 1. Compare the accuracy of five alternative Mach-L techniques with that of conventional LR. 2. Use Mach-L to forecast PCOS and prioritize the risk factors.
Totally, 170 PCOS patients and 950 control participants were included. We collected information on demographics, biochemistry, and lifestyle. PCOS was identified using Rotterdam criteria. Random Forest (RF), stochastic gradient boosting (SGB), multivariate adaptive regression splines (MARS), extreme gradient boosting (XGBoost), and gradient boosting with categorical features support (CatBoost) are five Mach-L algorithms that were used. Models with lower estimation errors were better.
By using t-test, we found subjects with PCOS were younger, glutamic oxaloacetic transaminase (GOT), glutamic pyruvic transaminase (GPT), γ-Glutamyl transferase (γ-GT), Triglyceride (TG), and educational levels were higher. All the five Mach-L methods had lower estimation errors compared to LR. The average of the AUC derived from Mach-L was mean AUC of 0.6669, higher than the that of LR (0.5908). Finally, age, TG, GPT, white blood cell count (WBC), uric acid (UA), and platelet (Plt) were the six most important risk factors selected by Mach-L.
Mach-L methods overtook conventional LR and age was the most significant factor, followed by TG, GPT, WBC, UA, and Plt in a cohort of Chinese women.
多囊卵巢综合征(PCOS)是最常见的激素紊乱疾病之一,全球估计发病率在5%至21%之间。已发现许多与PCOS相关的因素。然而,这些研究大多采用传统方法,如多元逻辑回归(LR)。如今,机器学习(Mach-L)作为一种新方法出现,并可用于医学研究。在本研究中,有两个目标:1. 比较五种替代Mach-L技术与传统LR的准确性。2. 使用Mach-L预测PCOS并对风险因素进行排序。
共纳入170例PCOS患者和950例对照参与者。我们收集了人口统计学、生物化学和生活方式方面的信息。PCOS采用鹿特丹标准进行诊断。使用了随机森林(RF)、随机梯度提升(SGB)、多元自适应回归样条(MARS)、极端梯度提升(XGBoost)和带分类特征支持的梯度提升(CatBoost)这五种Mach-L算法。估计误差较低的模型更好。
通过t检验,我们发现PCOS患者更年轻,谷草转氨酶(GOT)、谷丙转氨酶(GPT)、γ-谷氨酰转移酶(γ-GT)、甘油三酯(TG)和教育水平更高。与LR相比,所有五种Mach-L方法的估计误差都更低。Mach-L得出的AUC平均值为0.6669,高于LR的AUC(0.5908)。最后,年龄、TG、GPT、白细胞计数(WBC)、尿酸(UA)和血小板(Plt)是Mach-L选择出的六个最重要的风险因素。
在一组中国女性中,Mach-L方法优于传统LR方法,年龄是最重要的因素,其次是TG、GPT、WBC、UA和Plt。