Suppr超能文献

用于多囊卵巢综合征诊断生物标志物识别和免疫细胞浸润分析的机器学习模型的开发。

Development of machine learning models for diagnostic biomarker identification and immune cell infiltration analysis in PCOS.

作者信息

Chen Wenxiu, Miao Jianliang, Chen Jingfei, Chen Jianlin

机构信息

Reproductive Medicine Center, Department of Obstetrics and Gynecology, The Second Xiangya Hospital, Central South University, Changsha, Hunan, China.

First Affiliated Hospital of Dalian Medical University, Dalian Medical University, Dalian, China.

出版信息

J Ovarian Res. 2025 Jan 3;18(1):1. doi: 10.1186/s13048-024-01583-1.

Abstract

BACKGROUND

Polycystic ovary syndrome (PCOS) is a common endocrine disorder affecting women of reproductive age. It is characterized by symptoms such as hyperandrogenemia, oligo or anovulation and polycystic ovarian, significantly impacting quality of life. However, the practical implementation of machine learning (ML) in PCOS diagnosis is hindered by the limitations related to data size and algorithmic models. To address this research gap, we have increased the sample size in our study and aim to utilize two ML algorithms to analyze and validate diagnostic biomarkers, as well as explore immune cell infiltration patterns in PCOS.

METHODS

We performed RNA-seq analysis on granulosa cell, including 13 samples from normal controls and 25 samples from women with PCOS. The data from our study were combined with publicly available databases. Batch effects were corrected using the 'sva' package in R software. Differential expression analysis was performed to identify genes that exhibited significant differences between the two groups. These differentially expressed genes (DEGs) were further analyzed for Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Hub genes were selected by intersecting the results of both methods after using LASSO and SVM-RFE for central gene selection for DEGs. Receiver Operating Characteristic (ROC) curves were employed to verify the accuracy of models by SVM and XGBoost. CIBERSORT analysis was performed to determine the relative abundances of immune cell populations. GSEA was analyzed to illustrate the expression patterns of genes within highly enriched functional pathways. RT-qPCR was used to validate the reliability of hub genes.

RESULTS

824 DEGs were found between the normal control and PCOS groups, including 376 upregulated and 448 downregulated genes. These DEGs were associated with endocytosis, salmonella infection and focal adhesion based on the KEGG enrichment analysis. Through overlapping LASSO and SVM-RFE algorithms, we identified four hub genes (CNTN2, CASR, CACNB3, MFAP2) that are significantly associated with the PCOS group. The diagnostic efficacy validation set using SVM and XGBoost yielded AUC values of 0.795 and 0.875, respectively, indicating their potential as diagnostic biomarkers. Consistent with the data analysis, the upregulation of CNTN2, CASR, CACNB3, and MFAP2 in PCOS was confirmed by RT-qPCR analysis on human granulosa cells. Furthermore, according to CIBERSORT analysis, a significant reduction in CD4 memory resting T cells was revealed in the PCOS group compared to the normal control group (P < 0.05).

CONCLUSIONS

This study identified CNTN2, CASR, CACNB3, and MFAP2 as potential diagnostic biomarkers for PCOS, which provides strong evidence for existing research on hub genes. Furthermore, the analysis of immune cell infiltration revealed the significant involvement of CD4 memory resting T cells in the onset and progression of PCOS. These findings shed light on potential mechanisms underlying PCOS pathogenesis and provide valuable insights for future research and therapeutic interventions.

摘要

背景

多囊卵巢综合征(PCOS)是一种影响育龄女性的常见内分泌疾病。其特征包括高雄激素血症、少排卵或无排卵以及多囊卵巢等症状,对生活质量有显著影响。然而,机器学习(ML)在PCOS诊断中的实际应用受到数据规模和算法模型相关限制的阻碍。为填补这一研究空白,我们在研究中增加了样本量,旨在利用两种ML算法分析和验证诊断生物标志物,并探索PCOS中的免疫细胞浸润模式。

方法

我们对颗粒细胞进行了RNA测序分析,包括13例正常对照样本和25例PCOS女性样本。我们研究的数据与公开可用数据库相结合。使用R软件中的“sva”包校正批次效应。进行差异表达分析以鉴定两组之间表现出显著差异的基因。这些差异表达基因(DEG)进一步分析基因本体(GO)术语和京都基因与基因组百科全书(KEGG)通路。在对DEG使用LASSO和SVM - RFE进行核心基因选择后,通过两种方法结果的交集选择枢纽基因。采用受试者工作特征(ROC)曲线通过支持向量机(SVM)和极端梯度提升(XGBoost)验证模型的准确性。进行CIBERSORT分析以确定免疫细胞群体丰度。进行基因集富集分析(GSEA)以说明高度富集功能通路内基因的表达模式。使用逆转录定量聚合酶链反应(RT - qPCR)验证枢纽基因的可靠性。

结果

在正常对照组和PCOS组之间发现了824个DEG,包括376个上调基因和448个下调基因。基于KEGG富集分析,这些DEG与内吞作用、沙门氏菌感染和粘着斑相关。通过重叠LASSO和SVM - RFE算法,我们鉴定出四个与PCOS组显著相关的枢纽基因(接触蛋白2(CNTN2)、钙敏感受体(CASR)、L型钙通道β3亚基(CACNB3)、微纤维相关蛋白2(MFAP2))。使用SVM和XGBoost的诊断效能验证集分别产生的曲线下面积(AUC)值为0.795和0.875,表明它们作为诊断生物标志物的潜力。与数据分析一致,通过对人颗粒细胞的RT - qPCR分析证实了PCOS中CNTN | 2、CASR、CACNB3和MFAP2的上调。此外,根据CIBERSORT分析,与正常对照组相比,PCOS组中CD4记忆静止T细胞显著减少(P < 0.05)。

结论

本研究确定CNTN2、CASR、CACNB3和MFAP2为PCOS的潜在诊断生物标志物,为现有关于枢纽基因的研究提供了有力证据。此外,免疫细胞浸润分析揭示了CD4记忆静止T细胞在PCOS发病和进展中的显著参与。这些发现阐明了PCOS发病机制的潜在机制,并为未来研究和治疗干预提供了有价值的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5d6/11697806/9f5bfec0ce2b/13048_2024_1583_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验