Endocrine Diabetes Department, Cangzhou Central Hospital, Cangzhou, Hebei, 061000, China.
Curr Pharm Des. 2021;27(24):2827-2833. doi: 10.2174/1381612826666201217143403.
Type 1 diabetes is a chronic autoimmune disease featured by insulin deprivation caused by pancreatic β-cell loss, followed by hyperglycaemia.
Currently, there is no cure for this disease in clinical treatment, and patients have to accept a lifelong injection of insulin. The exploration of potential diagnosis biomarkers through analysis of mass data by bioinformatics tools and machine learning is important for type 1 diabetes.
We collected two mRNA expression datasets of type 1 diabetes peripheral blood samples from GEO, screened differentially expressed genes (DEGs) by R software, and conducted GO and KEGG pathway enrichment using the DEGs. Moreover, the STRING database and Cytoscape were used to build PPI network and predict hub genes. We constructed a logistic regression model by using the hub genes to assess sample type.
Bioinformatic analysis of the GEO dataset revealed 92 and 75 DEGs in GSE50098 and GSE9006 datasets, separately, and 10 overlapping DEGs. PPI network of these 10 DEGs showed 7 hub genes, namely EGR1, LTF, CXCL1, TNFAIP6, PGLYRP1, CHI3L1 and CAMP. We built a logistic regression model based on these hub genes and optimized the model to 3 genes (LTF, CAMP and PGLYRP1) based logistic model. The values of the area under the curve (AUC) of training set GSE50098 and testing set GSE9006 were 0.8452 and 0.8083, indicating the efficacy of this model.
Integrated bioinformatic analysis of gene expression in type 1 diabetes and the effective logistic regression model built in our study may provide promising diagnostic methods for type 1 diabetes.
1 型糖尿病是一种慢性自身免疫性疾病,其特征是由于胰岛 β 细胞丧失导致胰岛素缺乏,进而引起高血糖。
目前,临床治疗中尚无该疾病的治愈方法,患者必须接受终身胰岛素注射。通过生物信息学工具和机器学习对大量数据进行分析,探索潜在的诊断生物标志物对于 1 型糖尿病至关重要。
我们从 GEO 中收集了两批 1 型糖尿病外周血样本的 mRNA 表达数据集,使用 R 软件筛选差异表达基因(DEGs),并对 DEGs 进行 GO 和 KEGG 通路富集分析。此外,使用 STRING 数据库和 Cytoscape 构建 PPI 网络,并预测关键基因。我们使用关键基因构建逻辑回归模型,以评估样本类型。
对 GEO 数据集的生物信息学分析显示,GSE50098 和 GSE9006 数据集分别有 92 个和 75 个 DEG,两个数据集共有 10 个重叠 DEG。这 10 个 DEG 的 PPI 网络显示了 7 个关键基因,即 EGR1、LTF、CXCL1、TNFAIP6、PGLYRP1、CHI3L1 和 CAMP。我们基于这些关键基因构建了一个逻辑回归模型,并基于逻辑模型优化为 3 个基因(LTF、CAMP 和 PGLYRP1)。训练集 GSE50098 和测试集 GSE9006 的曲线下面积(AUC)值分别为 0.8452 和 0.8083,表明该模型具有较好的效能。
对 1 型糖尿病基因表达的综合生物信息学分析和我们构建的有效逻辑回归模型,可能为 1 型糖尿病提供有前途的诊断方法。