Hong Jin Hwa, Ouh Yung Taek, Jeong Sohyeon, Oh Yoonji, Cho Hyun Woong, Lee Jae Kwan, Kim Hayeon, Kim Chungyeul, Roh Sanghyun, Kim Eun Na, Chun Yikyeong, Gim Jeong-An
Department of Obstetrics and Gynecology, Guro Hospital, Korea University College of Medicine, Seoul, Republic of Korea.
Department of Obstetrics and Gynecology, Ansan Hospital, Korea University College of Medicine, Ansan, Republic of Korea.
Front Genet. 2025 Apr 28;16:1569122. doi: 10.3389/fgene.2025.1569122. eCollection 2025.
The prognosis within each subtype varies due to histological and molecular factors. This study leverages omics datasets and machine learning to identify biomarkers associated with EC recurrence in different molecular subtypes.
Utilizing DNA methylation, RNA-sequencing, and common variant data from 116 EC samples in The Cancer Genome Atlas (TCGA), differentially expressed genes (DEGs) and differentially methylated regions (DMRs) were identified using t-tests between recurrence and non-recurrence groups. These were visualized through volcano plots and heat maps, while decision trees and random forests classified and stratified the samples.
A machine learning analysis combined with box plots showed that in the copy number-high (CN-H) recurrence group, PARD6G-AS1 had decreased methylation, CSMD1 had increased methylation, and TESC expression was higher than the non-recurrence group. In the copy number-low (CN-L) recurrence group, CD44 expression was elevated. Further validation using TCGA clinical data confirmed PARD6G-AS1 hypomethylation and CD44 overexpression as significant indicators of recurrence (p=0.006 and p=0.02, respectively), and both were linked to advanced stage and lymph node metastasis.
The study concludes that PARD6G-AS1 hypomethylation and CD44 overexpression are potential predictors of recurrence in CN-H and CN-L EC patients, respectively.
由于组织学和分子因素,各亚型的预后有所不同。本研究利用组学数据集和机器学习来识别与不同分子亚型的子宫内膜癌(EC)复发相关的生物标志物。
利用来自癌症基因组图谱(TCGA)中116份EC样本的DNA甲基化、RNA测序和常见变异数据,通过复发组和非复发组之间的t检验识别差异表达基因(DEG)和差异甲基化区域(DMR)。通过火山图和热图对这些进行可视化,同时决策树和随机森林对样本进行分类和分层。
机器学习分析结合箱线图显示,在拷贝数高(CN-H)复发组中,PARD6G-AS1甲基化降低,CSMD1甲基化增加,TESC表达高于非复发组。在拷贝数低(CN-L)复发组中,CD44表达升高。使用TCGA临床数据进行的进一步验证证实,PARD6G-AS1低甲基化和CD44过表达是复发的重要指标(分别为p = 0.006和p = 0.02),且两者均与晚期和淋巴结转移有关。
该研究得出结论,PARD6G-AS1低甲基化和CD44过表达分别是CN-H和CN-L型EC患者复发的潜在预测指标。