Department of Dermatology, Yanbian University Hospital, Yanji, Jilin Province, China.
Department of Dermatology and Cutaneous Biology Research Institute, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea.
Aging (Albany NY). 2022 May 17;14(10):4270-4280. doi: 10.18632/aging.204084.
Osteoporosis is a severe chronic skeletal disorder that affects older individuals, especially postmenopausal women. However, molecular biomarkers for predicting the risk of osteoporosis are not well characterized. The aim of this study was to identify combined biomarkers for predicting the risk of osteoporosis using machine learning methods. We merged three publicly available gene expression datasets (GSE56815, GSE13850, and GSE2208) to obtain expression data for 6354 unique genes in postmenopausal women (45 with high bone mineral density and 45 with low bone mineral density). All machine learning methods were implemented in R, with the GEOquery and limma packages, for dataset download and differentially expressed gene identification, and a nomogram for predicting the risk of osteoporosis was constructed. We detected 378 significant differentially expressed genes using the limma package, representing 15 major biological pathways. The performance of the predictive models based on combined biomarkers (two or three genes) was superior to that of models based on a single gene. The best predictive gene set among two-gene sets included and . The best predictive gene set among three-gene sets included and . Overall, we demonstrated the advantages of using combined versus single biomarkers for predicting the risk of osteoporosis. Further, the predictive nomogram constructed using combined biomarkers could be used by clinicians to identify high-risk individuals and in the design of efficient clinical trials to reduce the incidence of osteoporosis.
骨质疏松症是一种严重的慢性骨骼疾病,影响老年人,尤其是绝经后妇女。然而,用于预测骨质疏松症风险的分子生物标志物尚未得到很好的描述。本研究旨在使用机器学习方法确定用于预测骨质疏松症风险的联合生物标志物。我们合并了三个公开的基因表达数据集(GSE56815、GSE13850 和 GSE2208),以获得绝经后妇女(45 名骨矿物质密度高和 45 名骨矿物质密度低)的 6354 个独特基因的表达数据。所有机器学习方法均在 R 中实现,使用 GEOquery 和 limma 包进行数据集下载和差异表达基因识别,并构建用于预测骨质疏松症风险的列线图。我们使用 limma 包检测到 378 个具有显著差异表达的基因,代表 15 个主要的生物学途径。基于联合生物标志物(两个或三个基因)的预测模型的性能优于基于单个基因的模型。两个基因集中最佳预测基因集包括 和 。三个基因集中最佳预测基因集包括 和 。总体而言,我们证明了使用联合生物标志物而非单一生物标志物预测骨质疏松症风险的优势。此外,使用联合生物标志物构建的预测列线图可由临床医生用于识别高风险个体,并可用于设计有效的临床试验,以降低骨质疏松症的发病率。