Suppr超能文献

通过使用机器学习、文本挖掘和基因表达分析来解码糖尿病生物标志物和相关分子机制。

Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis.

机构信息

Department of Oral Biology, Faculty of Dentistry, Mansoura University, Mansoura 35116, Egypt.

Agricultural Genetic Engineering Research Institute, Agricultural Research Center, Giza 12619, Egypt.

出版信息

Int J Environ Res Public Health. 2022 Oct 26;19(21):13890. doi: 10.3390/ijerph192113890.

Abstract

The molecular basis of diabetes mellitus is yet to be fully elucidated. We aimed to identify the most frequently reported and differential expressed genes (DEGs) in diabetes by using bioinformatics approaches. Text mining was used to screen 40,225 article abstracts from diabetes literature. These studies highlighted 5939 diabetes-related genes spread across 22 human chromosomes, with 112 genes mentioned in more than 50 studies. Among these genes, , , , , , , , , , and were mentioned in more than 200 articles. These genes are correlated with the regulation of glycogen and polysaccharide, adipogenesis, AGE/RAGE, and macrophage differentiation. Three datasets (44 patients and 57 controls) were subjected to gene expression analysis. The analysis revealed 135 significant DEGs, of which , , , , , , , , and were the top 10 DEGs. These genes were enriched in aerobic respiration, T-cell antigen receptor pathway, tricarboxylic acid metabolic process, vitamin D receptor pathway, toll-like receptor signaling, and endoplasmic reticulum (ER) unfolded protein response. The results of text mining and gene expression analyses used as attribute values for machine learning (ML) analysis. The decision tree, extra-tree regressor and random forest algorithms were used in ML analysis to identify unique markers that could be used as diabetes diagnosis tools. These algorithms produced prediction models with accuracy ranges from 0.6364 to 0.88 and overall confidence interval (CI) of 95%. There were 39 biomarkers that could distinguish diabetic and non-diabetic patients, 12 of which were repeated multiple times. The majority of these genes are associated with stress response, signalling regulation, locomotion, cell motility, growth, and muscle adaptation. Machine learning algorithms highlighted the use of the gene as a biomarker for diabetes early detection. Our data mining and gene expression analysis have provided useful information about potential biomarkers in diabetes.

摘要

糖尿病的分子基础尚未完全阐明。我们旨在通过生物信息学方法鉴定糖尿病中最常报道和差异表达的基因(DEGs)。文本挖掘用于筛选来自糖尿病文献的 40225 篇文章摘要。这些研究突出了分布在 22 个人类染色体上的 5939 个与糖尿病相关的基因,其中 112 个基因在超过 50 项研究中被提及。在这些基因中,、、、、、、、、和在 200 多篇文章中被提及。这些基因与糖原和多糖的调节、脂肪生成、AGE/RAGE 和巨噬细胞分化有关。三个数据集(44 名患者和 57 名对照)进行了基因表达分析。分析显示有 135 个显著的 DEGs,其中、、、、、、、和是前 10 个 DEGs。这些基因在有氧呼吸、T 细胞抗原受体途径、三羧酸代谢过程、维生素 D 受体途径、 Toll 样受体信号和内质网(ER)未折叠蛋白反应中富集。文本挖掘和基因表达分析的结果被用作机器学习(ML)分析的属性值。决策树、Extra-Tree 回归器和随机森林算法用于 ML 分析,以识别可作为糖尿病诊断工具的独特标志物。这些算法生成的预测模型的准确率范围为 0.6364 至 0.88,整体置信区间(CI)为 95%。有 39 个生物标志物可以区分糖尿病患者和非糖尿病患者,其中 12 个标志物被多次重复。这些基因中的大多数与应激反应、信号调节、运动、细胞运动、生长和肌肉适应有关。机器学习算法突出了使用基因作为糖尿病早期检测的生物标志物。我们的数据挖掘和基因表达分析为糖尿病潜在生物标志物提供了有用的信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11b3/9656783/a76e85fd5dd9/ijerph-19-13890-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验