Suppr超能文献

一种用于混合结果分析中跨队列学习的一次性无损算法。

A One-Shot Lossless Algorithm for Cross-Cohort Learning in Mixed-Outcomes Analysis.

作者信息

Li Ruowang, Benz Luke, Duan Rui, Denny Joshua C, Hakonarson Hakon, Mosley Jonathan D, Smoller Jordan W, Wei Wei-Qi, Lumley Thomas, Ritchie Marylyn D, Moore Jason H, Chen Yong

机构信息

Department of Computational Biomedicine, Cedars-Sinai Medical Center.

Department of Biostatistics, Harvard T.H. Chan School of Public Health.

出版信息

medRxiv. 2024 Dec 4:2024.01.09.24301073. doi: 10.1101/2024.01.09.24301073.

Abstract

In cross-cohort studies, integrating diverse datasets, such as electronic health records (EHRs), is both essential and challenging due to cohort-specific variations, distributed data storage, and data privacy concerns. Traditional methods often require data pooling or complex data harmonization, which can reduce efficiency and limit the scope of cross-cohort learning. We introduce mixWAS, a one-shot, lossless algorithm that efficiently integrates distributed EHR datasets via summary statistics. Unlike existing approaches, mixWAS preserves cohort-specific covariate associations and supports simultaneous mixed-outcome analyses. Simulations demonstrate that mixWAS outperforms conventional methods in accuracy and efficiency across various scenarios. Applied to EHR data from seven cohorts in the US, mixWAS identified 4,534 significant cross-cohort genetic associations among traits such as blood lipids, BMI, and circulatory diseases. Validation with an independent UK EHR dataset confirmed 97.7% of these associations, underscoring the algorithm's robustness. By enabling lossless cross-cohort integration, mixWAS improves the precision of multi-outcome analyses and expands the potential for actionable insights in healthcare research.

摘要

在跨队列研究中,整合多种数据集,如电子健康记录(EHR),由于队列特定的差异、分布式数据存储和数据隐私问题,既至关重要又具有挑战性。传统方法通常需要数据合并或复杂的数据协调,这可能会降低效率并限制跨队列学习的范围。我们引入了mixWAS,这是一种一次性的无损算法,通过汇总统计有效地整合分布式EHR数据集。与现有方法不同,mixWAS保留了队列特定的协变量关联,并支持同时进行混合结果分析。模拟表明,在各种场景下,mixWAS在准确性和效率方面均优于传统方法。将mixWAS应用于美国七个队列的EHR数据,该算法在血脂、BMI和循环系统疾病等性状之间识别出4534个显著的跨队列基因关联。使用独立的英国EHR数据集进行验证,证实了其中97.7%的关联,凸显了该算法的稳健性。通过实现无损跨队列整合,mixWAS提高了多结果分析的精度,并扩大了医疗保健研究中可采取行动的见解的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/438d/12234047/ec4e337eaea0/nihpp-2024.01.09.24301073v3-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验