Department of Emergency, Hebei General Hospital, Shijiazhuang, 050051,China.
Comb Chem High Throughput Screen. 2022;25(1):21-28. doi: 10.2174/1386207323666201204130031.
Sepsis is a life-threatening disease caused by the dysregulated host response to the infection and the major cause of death of patients in the intensive care unit (ICU).
Early diagnosis of sepsis could significantly reduce in-hospital mortality. Though generated from infection, the development of sepsis follows its own psychological process and disciplines, alters with gender, health status and other factors. Hence, the analysis of mass data by bioinformatics tools and machine learning is a promising method for exploring early diagnosis.
We collected miRNA and mRNA expression data of sepsis blood samples from Gene Expression Omnibus (GEO) and ArrayExpress databases, screened out differentially expressed genes (DEGs) by R software, predicted miRNA targets on TargetScanHuman and miRTarBase websites, conducted Gene Ontology (GO) term and KEGG pathway enrichment analysis based on overlapping DEGs. The STRING database and Cytoscape were used to build protein-protein interaction (PPI) network and predict hub genes. Then we constructed a Random Forest model by using the hub genes to assess sample type.
Bioinformatic analysis of GEO dataset revealed 46 overlapping DEGs in sepsis. The PPI network analysis identified five hub genes, SOCS3, KBTBD6, FBXL5, FEM1C and WSB1. Random Forest model based on these five hub genes was used to assess GSE95233 and GSE95233 datasets, and the area under the curve (AUC) of ROC was 0.900 and 0.7988, respectively, which confirmed the efficacy of this model.
The integrated analysis of gene expression in sepsis and the effective Random Forest model built in this study may provide promising diagnostic methods for sepsis.
脓毒症是一种危及生命的疾病,由宿主对感染的失调反应引起,是重症监护病房(ICU)患者死亡的主要原因。
早期诊断脓毒症可显著降低住院死亡率。尽管脓毒症是由感染引起的,但它的发展遵循其自身的心理过程和学科,因性别、健康状况和其他因素而改变。因此,通过生物信息学工具和机器学习对大量数据进行分析是探索早期诊断的一种很有前途的方法。
我们从基因表达综合数据库(GEO)和 ArrayExpress 数据库中收集了脓毒症血液样本的 miRNA 和 mRNA 表达数据,使用 R 软件筛选差异表达基因(DEGs),在 TargetScanHuman 和 miRTarBase 网站上预测 miRNA 靶标,基于重叠 DEGs 进行基因本体论(GO)术语和京都基因与基因组百科全书(KEGG)通路富集分析。使用 STRING 数据库和 Cytoscape 构建蛋白质-蛋白质相互作用(PPI)网络并预测枢纽基因。然后,我们使用这些枢纽基因构建了一个随机森林模型来评估样本类型。
GEO 数据集的生物信息学分析显示脓毒症中有 46 个重叠的 DEGs。PPI 网络分析确定了五个枢纽基因,SOCS3、KBTBD6、FBXL5、FEM1C 和 WSB1。基于这五个枢纽基因的随机森林模型用于评估 GSE95233 和 GSE95233 数据集,ROC 曲线下的面积(AUC)分别为 0.900 和 0.7988,证实了该模型的有效性。
脓毒症基因表达的综合分析以及本研究中构建的有效随机森林模型,可能为脓毒症提供有前途的诊断方法。