Suppr超能文献

使用LASSO回归算法和支持向量机方法筛选外周血单个核细胞中的重要标志物以预测女性骨质疏松症风险

Screening of Important Markers in Peripheral Blood Mononuclear Cells to Predict Female Osteoporosis Risk Using LASSO Regression Algorithm and SVM Method.

作者信息

Tang Hongwei, Han Qingtian, Yin Yong

机构信息

Department of Orthopedics, Jiading District Central Hospital Affiliated to Shanghai University of Medicine & Health Sciences, Shanghai, China.

出版信息

Evol Bioinform Online. 2022 Jan 28;18:11769343221075014. doi: 10.1177/11769343221075014. eCollection 2022.

Abstract

BACKGROUND

Osteoporosis is a bone disease that increases the patient's risk of fracture. We aimed to identify robust marker genes related to osteoporosis based on different bioinformatic methods and multiple datasets.

METHODS

Three datasets from Gene Expression Omnibus (GEO) were utilized for analysis separately. Significantly differentially expressed genes (DEGs) from comparing high hip and low hip low bone mineral density (BMD) groups in the first dataset were identified for Gene Ontology (GO), Gene set enrichment analysis (GSEA) and Kyoto encyclopedia of genes and genomes (KEGG) to investigate the discrepantly enriched biological processes between high hip and low hip group. Last absolute shrinkage and selection operator (LASSO), SVM model and protein-protein interaction (PPI) regulatory network were performed and generated robust marker genes for downstream TF-target and miRNA-target prediction.

RESULTS

Several DEGs between high hip BMD group and low hip BMD group were obtained. And the metabolism-related pathways such as metabolic pathways, carbon metabolism, glyoxylate and dicarboxylate metabolism shown enrichment in these DEGs. Integration with LASSO regression analysis, 8 differential expression genes (, and ) in GSE62402 were identified as the optimal differential genes combination. Moreover, the SVM validation analysis in GSE56814 and GSE56815 datasets showed that the characteristic gene combinations presented high diagnostic effects, and the model AUC areas for GSE56814 was 0.899 and for GSE56815 was 0.921. Furthermore, the subcellular localization analysis of the 8 genes revealed that 4 proteins were located in the cytoplasm, 3 proteins were located in the nucleus, and 1 protein was located in the mitochondria. Additionally, the related TFs and miRNAs by performing TF-target and miRNA-target prediction for 5 genes ( and ) were investigated from PPI network.

CONCLUSION

The optimal differential genes combination (, and ) presented high diagnostic effect for osteoporosis risk.

摘要

背景

骨质疏松症是一种会增加患者骨折风险的骨病。我们旨在基于不同的生物信息学方法和多个数据集,确定与骨质疏松症相关的可靠标记基因。

方法

分别利用来自基因表达综合数据库(GEO)的三个数据集进行分析。在第一个数据集中,通过比较高髋部和低髋部低骨密度(BMD)组,鉴定出显著差异表达基因(DEG),用于基因本体论(GO)、基因集富集分析(GSEA)和京都基因与基因组百科全书(KEGG),以研究高髋部和低髋部组之间差异富集的生物学过程。最后进行最小绝对收缩和选择算子(LASSO)、支持向量机(SVM)模型和蛋白质-蛋白质相互作用(PPI)调控网络分析,生成可靠的标记基因用于下游转录因子-靶标和微小RNA-靶标预测。

结果

获得了高髋部BMD组和低髋部BMD组之间的几个DEG。并且代谢相关途径,如代谢途径、碳代谢、乙醛酸和二羧酸代谢在这些DEG中显示出富集。结合LASSO回归分析,GSE62402中的8个差异表达基因( 、 和 )被确定为最佳差异基因组合。此外,在GSE56814和GSE56815数据集中的SVM验证分析表明,特征基因组合具有较高的诊断效果,GSE56814的模型AUC面积为0.899,GSE56815的为0.921。此外,对这8个基因的亚细胞定位分析显示,4种蛋白质位于细胞质中,3种蛋白质位于细胞核中,1种蛋白质位于线粒体中。另外,从PPI网络中研究了对5个基因( 、 和 )进行转录因子-靶标和微小RNA-靶标预测时的相关转录因子和微小RNA。

结论

最佳差异基因组合( 、 和 )对骨质疏松症风险具有较高的诊断效果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4580/8801634/798039f28faa/10.1177_11769343221075014-fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验