Department of Computer Science and Engineering, Jahangirnagar University, Dhaka 1342, Bangladesh.
Health Informatics Lab, Department of Computer Science and Engineering, Daffodil International University, Dhaka 1216, Bangladesh.
Int J Environ Res Public Health. 2024 Oct 22;21(11):1392. doi: 10.3390/ijerph21111392.
Lung cancer (LC) is a significant global health issue, with smoking as the most common cause. Recent epidemiological studies have suggested that individuals who smoke are more susceptible to COVID-19. In this study, we aimed to investigate the influence of smoking and COVID-19 on LC using bioinformatics and machine learning approaches. We compared the differentially expressed genes (DEGs) between LC, smoking, and COVID-19 datasets and identified 26 down-regulated and 37 up-regulated genes shared between LC and smoking, and 7 down-regulated and 6 up-regulated genes shared between LC and COVID-19. Integration of these datasets resulted in the identification of ten hub genes (SLC22A18, CHAC1, ROBO4, TEK, NOTCH4, CD24, CD34, SOX2, PITX2, and GMDS) from protein-protein interaction network analysis. The WGCNA R package was used to construct correlation network analyses for these shared genes, aiming to investigate the relationships among them. Furthermore, we also examined the correlation of these genes with patient outcomes through survival curve analyses. The gene ontology and pathway analyses were performed to find out the potential therapeutic targets for LC in smoking and COVID-19 patients. Moreover, machine learning algorithms were applied to the TCGA RNAseq data of LC to assess the performance of these common genes and ten hub genes, demonstrating high performances. The identified hub genes and molecular pathways can be utilized for the development of potential therapeutic targets for smoking and COVID-19-associated LC.
肺癌(LC)是一个重大的全球健康问题,吸烟是最常见的病因。最近的流行病学研究表明,吸烟者更容易感染 COVID-19。在这项研究中,我们旨在使用生物信息学和机器学习方法研究吸烟和 COVID-19 对 LC 的影响。我们比较了 LC、吸烟和 COVID-19 数据集之间的差异表达基因(DEGs),并确定了 LC 和吸烟之间共有的 26 个下调基因和 37 个上调基因,LC 和 COVID-19 之间共有的 7 个下调基因和 6 个上调基因。这些数据集的整合确定了十个枢纽基因(SLC22A18、CHAC1、ROBO4、TEK、NOTCH4、CD24、CD34、SOX2、PITX2 和 GMDS),来自蛋白质-蛋白质相互作用网络分析。WGCNA R 包用于对这些共享基因进行相关网络分析,旨在研究它们之间的关系。此外,我们还通过生存曲线分析检查了这些基因与患者预后的相关性。进行基因本体论和途径分析,以找出吸烟和 COVID-19 患者中 LC 的潜在治疗靶点。此外,还将机器学习算法应用于 LC 的 TCGA RNAseq 数据,以评估这些常见基因和十个枢纽基因的性能,表现出高性能。鉴定的枢纽基因和分子途径可用于开发针对吸烟和 COVID-19 相关 LC 的潜在治疗靶点。