Li Yajing, Deng Hongru
Department of Vascular Surgery, Fu Xing Hospital, Capital Medical University (FXH-CMU), Beijing 100038, China.
Int J Anal Chem. 2024 Nov 22;2024:2202321. doi: 10.1155/ianc/2202321. eCollection 2024.
This study aims to use bioinformatics and machine learning algorithms to screen and analyze the key genes involved in venous thromboembolism (VTE) and explore the relationship between these biomarkers and immune cell infiltration. The gene expression profile with the identifier GSE19151 was downloaded from the GEO database. Differential expression analysis using the limma package was conducted to identify genes that were differentially expressed between VTE and normal samples. Biological activities of these genes were then investigated through GO analysis utilizing the R language package. KEGG and GSEA were also performed to identify key signaling pathways. Furthermore, machine learning techniques were employed to determine hub gene signatures related to VTE, and ROC curves were used to validate the findings. To compare the immune infiltration of healthy and VTE samples, single sample gene set enrichment analysis (ssGSEA) was applied. Lastly, the Spearman correlation coefficient was used to assess the relationship between the expression of hub genes and immune cell infiltration. A total of 628 differentially expressed genes (DEGs) were discovered between the VTE samples and normal samples. GO analysis identified protein polyubiquitination, lysosomal lumen acidification, organellar ribosome, mitochondrial ribosome, ammonium transmembrane transporter activity, and immunoglobulin binding as the processes with the highest abundance of DEGs. KEGG pathway analysis revealed that DEGs were enriched in ribosome, COVID-19, viral infection, oxidative phosphorylation, Parkinson's disease, nonalcoholic fatty liver disease, apoptosis, and cancer. The most prominent KEGG pathways associated with VTE were ribosome, Parkinson's disease, oxidative phosphorylation, Alzheimer's disease, and Huntington's disease according to GSEA findings. DLST and LSP1 were identified as hub gene signatures in VTE by machine learning integrative analysis, and ROC curves confirmed their diagnostic value. Results from ssGSEA indicated a significant difference in the degree of immune cell infiltration between VTE and normal samples, with the expression of DLST and LSP1 positively correlated with the content of some immune cells. The R package, code, and analysis results used in this paper are available on https://github.com/doctorlaby/my-project. Our research is the first to utilize machine learning techniques in identifying DLST and LSP1 as significant biomarkers of VTE. With our findings, we have uncovered new insights into the underlying causes of VTE and potential treatments for affected patients.
本研究旨在利用生物信息学和机器学习算法筛选和分析静脉血栓栓塞症(VTE)相关的关键基因,并探索这些生物标志物与免疫细胞浸润之间的关系。从GEO数据库下载了标识符为GSE19151的基因表达谱。使用limma软件包进行差异表达分析,以鉴定VTE样本与正常样本之间差异表达的基因。然后利用R语言软件包通过GO分析研究这些基因的生物学活性。还进行了KEGG和GSEA分析以鉴定关键信号通路。此外,采用机器学习技术确定与VTE相关的枢纽基因特征,并使用ROC曲线验证研究结果。为比较健康样本和VTE样本的免疫浸润情况,应用了单样本基因集富集分析(ssGSEA)。最后,使用Spearman相关系数评估枢纽基因表达与免疫细胞浸润之间的关系。在VTE样本与正常样本之间共发现628个差异表达基因(DEG)。GO分析确定蛋白质多聚泛素化、溶酶体腔酸化、细胞器核糖体、线粒体核糖体、铵跨膜转运蛋白活性和免疫球蛋白结合是DEG丰度最高的过程。KEGG通路分析显示,DEG在核糖体、COVID-19、病毒感染、氧化磷酸化、帕金森病、非酒精性脂肪性肝病、凋亡和癌症中富集。根据GSEA结果,与VTE相关最显著的KEGG通路是核糖体、帕金森病、氧化磷酸化、阿尔茨海默病和亨廷顿病。通过机器学习综合分析,DLST和LSP1被鉴定为VTE中的枢纽基因特征,ROC曲线证实了它们的诊断价值。ssGSEA结果表明,VTE样本与正常样本之间免疫细胞浸润程度存在显著差异,DLST和LSP1的表达与某些免疫细胞的含量呈正相关。本文使用的R软件包、代码和分析结果可在https://github.com/doctorlaby/my-project上获取。我们的研究首次利用机器学习技术将DLST和LSP1鉴定为VTE的重要生物标志物。基于我们的研究结果,我们对VTE的潜在病因和受影响患者的潜在治疗方法有了新的认识。