• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

差异基因表达分析和机器学习确定了结构蛋白、转录因子、细胞因子和糖蛋白,包括SOX2、TOP2A、SPP1、COL1A1和TIMP1作为肺癌的潜在驱动因素。

Differential gene expression analysis and machine learning identified structural, TFs, cytokine and glycoproteins, including SOX2, TOP2A, SPP1, COL1A1, and TIMP1 as potential drivers of lung cancer.

作者信息

Shah Syed Naseer Ahmad, Parveen Rafat

机构信息

Department of Computer Science, Jamia Millia Islamia, New Delhi, India.

出版信息

Biomarkers. 2025 Mar;30(2):200-215. doi: 10.1080/1354750X.2025.2461698. Epub 2025 Feb 10.

DOI:10.1080/1354750X.2025.2461698
PMID:39888730
Abstract

BACKGROUND

Lung cancer is a primary global health concern, responsible for a considerable portion of cancer-related fatalities worldwide. Understanding its molecular complexities is crucial for identifying potential targets for treatment. The goal is to slow disease progression and intervene early to prevent the development of advanced lung cancer cases. Hence, there's an urgent need for new biomarkers that can detect lung cancer in its early stages.

METHODS

The study conducted RNA-Seq analysis of lung cancer samples from the publicly available SRA database (NCBI SRP009408), including both control and tumour samples. The genes with differential expression between tumour and healthy tissues were identified using R and Bioconductor. Machine learning (ML) techniques, Random Forest, Lasso, XGBoost, Gradient Boosting and Elastic Net were employed to pinpoint significant genes followed by classifiers, Multilayer Perceptron (MLP), Support Vector Machines (SVM) and k-Nearest Neighbours (k-NN). Gene ontology and pathway analyses were performed on the significant differentially expressed genes (DEGs). The top genes from DEG and machine learning analyses were combined for protein-protein interaction (PPI) analysis, identifying 10 hub genes essential for lung cancer progression.

RESULTS

The integrated analysis of ML and DEGs revealed the significance of specific genes in lung cancer samples, identified the top 5 upregulated genes (COL11A1, TOP2A, SULF1, DIO2, MIR196A2) and the top 5 downregulated genes (PDK4, FOSB, FLYWCH1, CYB5D2, MIR328), along with their associated genes implicated in pathways or co-expression networks were identified. Among the various algorithms employed, Random Forest and XGBoost proved effective in identifying common genes, underscoring their potential significance in lung cancer pathogenesis. The MLP exhibited the highest accuracy in classifying samples using all genes. Additionally, the protein-protein interaction (PPI) analysis identified 10 hub genes that are pivotal in lung cancer pathogenesis: COL1A1, SOX2, SPP1, THBS2, POSTN, COL5A1, COL11A1, TIMP1, TOP2A and PKP1.

CONCLUSION

The study contributes to the early prediction of lung cancer by identifying potential biomarkers that could enhance early diagnosis and pave the way for practical clinical applications in the future. Integrating DEGs and machine learning-derived significant genes for PPI analysis offers a robust approach to uncovering critical molecular targets for lung cancer treatment.

摘要

背景

肺癌是全球主要的健康问题,在全球癌症相关死亡中占相当大的比例。了解其分子复杂性对于确定潜在的治疗靶点至关重要。目标是减缓疾病进展并尽早干预以预防晚期肺癌病例的发生。因此,迫切需要能够在肺癌早期阶段进行检测的新生物标志物。

方法

该研究对来自公开可用的SRA数据库(NCBI SRP009408)的肺癌样本进行了RNA测序分析,包括对照样本和肿瘤样本。使用R和生物导体软件包确定肿瘤组织和健康组织之间差异表达的基因。采用机器学习(ML)技术,随机森林、套索回归、XGBoost、梯度提升和弹性网络,以确定重要基因,随后使用多层感知器(MLP)、支持向量机(SVM)和k近邻(k-NN)分类器。对显著差异表达基因(DEG)进行基因本体和通路分析。将来自DEG分析和机器学习分析的顶级基因进行蛋白质-蛋白质相互作用(PPI)分析,确定了10个对肺癌进展至关重要的枢纽基因。

结果

ML和DEG的综合分析揭示了肺癌样本中特定基因的重要性,确定了前5个上调基因(COL11A1、TOP2A、SULF1、DIO2、MIR196A2)和前5个下调基因(PDK4、FOSB、FLYWCH1、CYB5D2、MIR328),以及它们在通路或共表达网络中涉及的相关基因。在使用的各种算法中,随机森林和XGBoost被证明在识别常见基因方面有效,突出了它们在肺癌发病机制中的潜在重要性。MLP在使用所有基因对样本进行分类时表现出最高的准确性。此外,蛋白质-蛋白质相互作用(PPI)分析确定了10个在肺癌发病机制中起关键作用的枢纽基因:COL1A1、SOX2、SPP1、THBS2、POSTN、COL5A1、COL11A1、TIMP1、TOP2A和PKP1。

结论

该研究通过识别潜在的生物标志物,有助于肺癌的早期预测,这些生物标志物可增强早期诊断,并为未来的实际临床应用铺平道路。将DEG和机器学习衍生的重要基因整合用于PPI分析,为揭示肺癌治疗的关键分子靶点提供了一种强大的方法。

相似文献

1
Differential gene expression analysis and machine learning identified structural, TFs, cytokine and glycoproteins, including SOX2, TOP2A, SPP1, COL1A1, and TIMP1 as potential drivers of lung cancer.差异基因表达分析和机器学习确定了结构蛋白、转录因子、细胞因子和糖蛋白,包括SOX2、TOP2A、SPP1、COL1A1和TIMP1作为肺癌的潜在驱动因素。
Biomarkers. 2025 Mar;30(2):200-215. doi: 10.1080/1354750X.2025.2461698. Epub 2025 Feb 10.
2
Identification of candidate biomarkers and pathways associated with SCLC by bioinformatics analysis.通过生物信息学分析鉴定与 SCLC 相关的候选生物标志物和途径。
Mol Med Rep. 2018 Aug;18(2):1538-1550. doi: 10.3892/mmr.2018.9095. Epub 2018 May 29.
3
Identification of lung adenocarcinoma biomarkers based on bioinformatic analysis and human samples.基于生物信息学分析和人类样本鉴定肺腺癌生物标志物。
Oncol Rep. 2020 May;43(5):1437-1450. doi: 10.3892/or.2020.7526. Epub 2020 Feb 28.
4
Investigation of Potential Mechanisms Associated with Non-small Cell Lung Cancer.非小细胞肺癌相关潜在机制的研究。
J Comput Biol. 2020 Sep;27(9):1433-1442. doi: 10.1089/cmb.2019.0081. Epub 2020 Feb 12.
5
Identification of Hub Genes Associated with Tumor-Infiltrating Immune Cells and ECM Dynamics as the Potential Therapeutic Targets in Gastric Cancer through an Integrated Bioinformatic Analysis and Machine Learning Methods.通过整合生物信息学分析和机器学习方法鉴定与肿瘤浸润免疫细胞和 ECM 动态相关的枢纽基因作为胃癌的潜在治疗靶点。
Comb Chem High Throughput Screen. 2023;26(4):653-667. doi: 10.2174/1386207325666220820163319.
6
Machine learning and bioinformatics analysis of diagnostic biomarkers associated with the occurrence and development of lung adenocarcinoma.机器学习和生物信息学分析与肺腺癌发生发展相关的诊断生物标志物。
PeerJ. 2024 Jul 23;12:e17746. doi: 10.7717/peerj.17746. eCollection 2024.
7
The role and machine learning analysis of mitochondrial autophagy-related gene expression in lung adenocarcinoma.线粒体自噬相关基因表达在肺腺癌中的作用及机器学习分析
Front Immunol. 2025 Apr 17;16:1509315. doi: 10.3389/fimmu.2025.1509315. eCollection 2025.
8
Bioinformatics analysis identifies COL1A1, THBS2 and SPP1 as potential predictors of patient prognosis and immunotherapy response in gastric cancer.生物信息学分析鉴定出 COL1A1、THBS2 和 SPP1 可作为胃癌患者预后和免疫治疗反应的潜在预测因子。
Biosci Rep. 2021 Jan 29;41(1). doi: 10.1042/BSR20202564.
9
Transcriptomic analysis and identification of prognostic biomarkers in cholangiocarcinoma.胆管癌的转录组分析和预后生物标志物鉴定。
Oncol Rep. 2019 Nov;42(5):1833-1842. doi: 10.3892/or.2019.7318. Epub 2019 Sep 17.
10
Integrated Bioinformatics Analysis for Identification of the Hub Genes Linked with Prognosis of Ovarian Cancer Patients.综合生物信息学分析鉴定与卵巢癌患者预后相关的枢纽基因。
Comput Math Methods Med. 2022 Jan 10;2022:5113447. doi: 10.1155/2022/5113447. eCollection 2022.

引用本文的文献

1
Detection of the Prognostic Gene CYB5D2 in Cervical Squamous Epithelial Lesions.宫颈鳞状上皮病变中预后基因CYB5D2的检测
Mediators Inflamm. 2025 May 13;2025:2360364. doi: 10.1155/mi/2360364. eCollection 2025.