Balikci Cicek Ipek, Kucukakcali Zeynep
Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280 Malatya, Turkey.
Genes (Basel). 2025 Jul 16;16(7):829. doi: 10.3390/genes16070829.
BACKGROUND/OBJECTIVES: Gastric cancer (GC) remains a significant global health burden due to its high mortality rate and frequent diagnosis at advanced stages. This study aimed to identify reliable diagnostic biomarkers and elucidate molecular mechanisms underlying GC by integrating transcriptomic data from independent platforms and applying machine learning techniques. METHODS: Two transcriptomic datasets from the Gene Expression Omnibus were analyzed: GSE26899 (microarray, = 108) as the discovery dataset and GSE248612 (RNA-seq, = 12) for validation. Differential expression analysis was conducted using limma and DESeq2, selecting genes with |log2FC| > 1 and adjusted < 0.05. The top 200 differentially expressed genes (DEGs) were used to develop machine learning models (random forest, logistic regression, neural networks). Functional enrichment analyses (GO, KEGG, Hallmark) were applied to explore relevant biological pathways. RESULTS: In GSE26899, 627 DEGs were identified (201 upregulated, 426 downregulated), with key genes including , , , , and . The random forest model demonstrated excellent classification performance (AUC = 0.952). GSE248612 validation yielded 738 DEGs. Cross-platform comparison confirmed 55.6% concordance among core genes, highlighting , , , , , , and . Enrichment analyses revealed involvement in ECM-receptor interaction, signaling, EMT, and cell cycle. CONCLUSIONS: This integrative transcriptomic and machine learning framework effectively identified high-confidence biomarkers for GC. Notably, , , , and emerged as consistent, biologically relevant candidates with strong diagnostic performance and potential clinical utility. These findings may aid early detection strategies and guide future therapeutic developments in gastric cancer.
背景/目的:由于胃癌(GC)死亡率高且晚期诊断频繁,它仍然是一个重大的全球健康负担。本研究旨在通过整合来自独立平台的转录组数据并应用机器学习技术,确定可靠的诊断生物标志物并阐明GC潜在的分子机制。 方法:分析了来自基因表达综合数据库的两个转录组数据集:作为发现数据集的GSE26899(微阵列,n = 108)和用于验证的GSE248612(RNA测序,n = 12)。使用limma和DESeq2进行差异表达分析,选择|log2FC|> 1且校正P <0.05的基因。前200个差异表达基因(DEG)用于开发机器学习模型(随机森林、逻辑回归、神经网络)。应用功能富集分析(GO、KEGG、标志性通路)来探索相关的生物学途径。 结果:在GSE26899中,鉴定出627个DEG(201个上调,426个下调),关键基因包括[此处原文缺失具体基因名]、[此处原文缺失具体基因名]、[此处原文缺失具体基因名]、[此处原文缺失具体基因名]和[此处原文缺失具体基因名]。随机森林模型表现出优异的分类性能(AUC = 0.952)。GSE248612验证产生了738个DEG。跨平台比较证实核心基因之间的一致性为55.6%,突出了[此处原文缺失具体基因名]、[此处原文缺失具体基因名]、[此处原文缺失具体基因名]、[此处原文缺失具体基因名]、[此处原文缺失具体基因名]、[此处原文缺失具体基因名]和[此处原文缺失具体基因名]。富集分析显示参与细胞外基质-受体相互作用、[此处原文缺失具体信号通路名]信号传导、上皮-间质转化和细胞周期。 结论:这种整合转录组学和机器学习的框架有效地鉴定了用于GC的高置信度生物标志物。值得注意的是,[此处原文缺失具体基因名]、[此处原文缺失具体基因名] [此处原文缺失具体基因名]和[此处原文缺失具体基因名]成为一致的、具有生物学相关性的候选物,具有强大的诊断性能和潜在的临床应用价值。这些发现可能有助于早期检测策略,并指导未来胃癌的治疗发展。
Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2025-3-28
Technol Cancer Res Treat. 2024
Front Cell Dev Biol. 2024-11-5
Bioengineering (Basel). 2023-1-28
J Natl Compr Canc Netw. 2022-2
Aging (Albany NY). 2021-11-14
Biomed Pharmacother. 2021-6
Cancer Manag Res. 2021-2-24
Cancer Med. 2020-12
Technol Cancer Res Treat. 2020