Chen Ji-Ying, Chen Wu-Jie, Zhu Zhi-Ying, Xu Shi, Huang Li-Lan, Tan Wen-Qing, Zhang Yong-Gang, Zhao Yan-Li
Department of Obstetrics and Gynecology, Shenzhen Longhua District Central Hospital, Shenzhen, China.
Department of Medical Laboratory, Shenzhen Longhua District Central Hospital, Shenzhen, China.
PLoS One. 2025 Jan 7;20(1):e0313494. doi: 10.1371/journal.pone.0313494. eCollection 2025.
Polycystic ovary syndrome (PCOS) is a primary endocrine disorder affecting premenopausal women involving metabolic dysregulation. We aimed to screen serum biomarkers in PCOS patients using untargeted lipidomics and ensemble machine learning. Serum from PCOS patients and non-PCOS subjects were collected for untargeted lipidomics analysis. Through analyzing the classification of differential lipid metabolites and the association between differential lipid metabolites and clinical indexes, ensemble machine learning, data preprocessing, statistical test pre-screening, ensemble learning method secondary screening, biomarkers verification and evaluation, and diagnostic panel model construction and verification were performed on the data of untargeted lipidomics. Results indicated that different lipid metabolites not only differ between groups but also have close effects on different corresponding clinical indexes. PI (18:0/20:3)-H and PE (18:1p/22:6)-H were identified as candidate biomarkers. Three machine learning models, logistic regression, random forest, and support vector machine, showed that screened biomarkers had better classification ability and effect. In addition, the correlation of candidate biomarkers was low, indicating that the overlap between the selected biomarkers was low, and the combination of panels was more optimized. When the AUC value of the test set of the constructed diagnostic panel model was 0.815, the model's accuracy in the test set was 0.74, specificity was 0.88, and sensitivity was 0.7. This study demonstrated the applicability and robustness of machine learning algorithms to analyze lipid metabolism data for efficient and reliable biomarker screening. PI (18:0/20:3)-H and PE (18:1p/22:6)-H showed great potential in diagnosing PCOS.
多囊卵巢综合征(PCOS)是一种影响绝经前女性的原发性内分泌疾病,涉及代谢失调。我们旨在使用非靶向脂质组学和集成机器学习筛选PCOS患者的血清生物标志物。收集PCOS患者和非PCOS受试者的血清进行非靶向脂质组学分析。通过分析差异脂质代谢物的分类以及差异脂质代谢物与临床指标之间的关联,对非靶向脂质组学数据进行了集成机器学习、数据预处理、统计检验预筛选、集成学习方法二次筛选、生物标志物验证与评估以及诊断面板模型构建与验证。结果表明,不同的脂质代谢物不仅在组间存在差异,而且对不同的相应临床指标有密切影响。PI(18:0/20:3)-H和PE(18:1p/22:6)-H被确定为候选生物标志物。逻辑回归、随机森林和支持向量机这三种机器学习模型表明,筛选出的生物标志物具有更好的分类能力和效果。此外,候选生物标志物之间的相关性较低,表明所选生物标志物之间的重叠度较低,面板组合更优化。当构建的诊断面板模型测试集的AUC值为0.815时,该模型在测试集上的准确率为0.74,特异性为0.88,敏感性为0.7。本研究证明了机器学习算法在分析脂质代谢数据以进行高效可靠的生物标志物筛选方面的适用性和稳健性。PI(18:0/20:3)-H和PE(18:1p/22:6)-H在诊断PCOS方面显示出巨大潜力。