Suppr超能文献

表型和基因组数据的综合协调改善了多研究骨质疏松症研究中的骨密度预测。

Integrative Harmonization of Phenotypic and Genomic Data Improves Bone Mineral Density Prediction in Multi-Study Osteoporosis Research.

作者信息

Liu Anqi, Liu Jianing, Wu Lang, Wu Qing

机构信息

Department of Biomedical Informatics, College of Medicine, The Ohio State University.

Pacific Center for Genome Research University of Hawai'i at Mānoa Honolulu HI USA.

出版信息

medRxiv. 2025 May 13:2025.05.12.25327471. doi: 10.1101/2025.05.12.25327471.

Abstract

PURPOSE

Harmonizing osteoporosis-related data across multiple datasets is essential for improving the accuracy and generalizability of bone mineral density (BMD) assessments. This study developed a harmonization framework to standardize phenotypic and genomic variables across three major U.S. osteoporosis datasets: GDBF, GWAS, and NHANES.

METHODS

We standardized key phenotypic variables (BMD, body mass index (BMI), age, sex, and race/ethnicity) using cohort-specific data dictionaries and applied multiple imputations by chained equations (MICE) to manage missing data. Genomic data were harmonized using principal component analysis (PCA)-based batch effect corrections. Residual regression methods were applied to standardize BMD values. The effectiveness of harmonization on BMD prediction was evaluated using generalized estimating equations (GEE) and mixed-effects models.

RESULTS

Post-harmonization, inter-study variability in BMI was significantly reduced (Ω = 0.0028), and BMD associations with covariates remained consistent across datasets. Harmonized models showed improved predictive performance, with explained variance in BMD increasing (R = 0.14). PCA confirmed the effective alignment of genetic data, reducing batch effects and improving cross-study compatibility.

CONCLUSION

This study demonstrates the feasibility and effectiveness of harmonizing phenotypic and genomic data for osteoporosis research. The harmonization framework enhances BMD prediction accuracy, supports more inclusive osteoporosis risk assessment, and improves the integration of multi-cohort datasets for future research. These findings highlight the potential of data harmonization in advancing precision medicine for osteoporosis prevention and management.

摘要

目的

整合多个数据集中与骨质疏松症相关的数据对于提高骨密度(BMD)评估的准确性和普遍性至关重要。本研究开发了一个整合框架,以标准化美国三个主要骨质疏松症数据集(GDBF、GWAS和NHANES)中的表型和基因组变量。

方法

我们使用特定队列的数据字典对关键表型变量(骨密度、体重指数(BMI)、年龄、性别和种族/民族)进行标准化,并应用链式方程多重填补法(MICE)来处理缺失数据。使用基于主成分分析(PCA)的批次效应校正来整合基因组数据。应用残差回归方法对骨密度值进行标准化。使用广义估计方程(GEE)和混合效应模型评估整合对骨密度预测的有效性。

结果

整合后,BMI的研究间变异性显著降低(Ω = 0.0028),并且骨密度与协变量之间的关联在各数据集中保持一致。整合后的模型显示出更好的预测性能,骨密度的解释方差增加(R = 0.14)。主成分分析证实了遗传数据的有效对齐,减少了批次效应并提高了跨研究的兼容性。

结论

本研究证明了整合骨质疏松症研究中的表型和基因组数据的可行性和有效性。整合框架提高了骨密度预测的准确性,支持更具包容性的骨质疏松症风险评估,并改善了多队列数据集在未来研究中的整合。这些发现突出了数据整合在推进骨质疏松症预防和管理精准医学方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6990/12132160/5376d677bc99/nihpp-2025.05.12.25327471v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验