一种用于混合结果分析中跨队列学习的一次性无损算法。

A One-Shot Lossless Algorithm for Cross-Cohort Learning in Mixed-Outcomes Analysis.

作者信息

Li Ruowang, Benz Luke, Duan Rui, Denny Joshua C, Hakonarson Hakon, Mosley Jonathan D, Smoller Jordan W, Wei Wei-Qi, Lumley Thomas, Ritchie Marylyn D, Moore Jason H, Chen Yong

机构信息

Department of Computational Biomedicine, Cedars-Sinai Medical Center.

Department of Biostatistics, Harvard T.H. Chan School of Public Health.

出版信息

medRxiv. 2024 Dec 4:2024.01.09.24301073. doi: 10.1101/2024.01.09.24301073.

DOI:10.1101/2024.01.09.24301073

PMID:38260403

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10802662/

Abstract

In cross-cohort studies, integrating diverse datasets, such as electronic health records (EHRs), is both essential and challenging due to cohort-specific variations, distributed data storage, and data privacy concerns. Traditional methods often require data pooling or complex data harmonization, which can reduce efficiency and limit the scope of cross-cohort learning. We introduce mixWAS, a one-shot, lossless algorithm that efficiently integrates distributed EHR datasets via summary statistics. Unlike existing approaches, mixWAS preserves cohort-specific covariate associations and supports simultaneous mixed-outcome analyses. Simulations demonstrate that mixWAS outperforms conventional methods in accuracy and efficiency across various scenarios. Applied to EHR data from seven cohorts in the US, mixWAS identified 4,534 significant cross-cohort genetic associations among traits such as blood lipids, BMI, and circulatory diseases. Validation with an independent UK EHR dataset confirmed 97.7% of these associations, underscoring the algorithm's robustness. By enabling lossless cross-cohort integration, mixWAS improves the precision of multi-outcome analyses and expands the potential for actionable insights in healthcare research.

摘要

在跨队列研究中，整合多种数据集，如电子健康记录（EHR），由于队列特定的差异、分布式数据存储和数据隐私问题，既至关重要又具有挑战性。传统方法通常需要数据合并或复杂的数据协调，这可能会降低效率并限制跨队列学习的范围。我们引入了mixWAS，这是一种一次性的无损算法，通过汇总统计有效地整合分布式EHR数据集。与现有方法不同，mixWAS保留了队列特定的协变量关联，并支持同时进行混合结果分析。模拟表明，在各种场景下，mixWAS在准确性和效率方面均优于传统方法。将mixWAS应用于美国七个队列的EHR数据，该算法在血脂、BMI和循环系统疾病等性状之间识别出4534个显著的跨队列基因关联。使用独立的英国EHR数据集进行验证，证实了其中97.7%的关联，凸显了该算法的稳健性。通过实现无损跨队列整合，mixWAS提高了多结果分析的精度，并扩大了医疗保健研究中可采取行动的见解的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/438d/12234047/ec4e337eaea0/nihpp-2024.01.09.24301073v3-f0001.jpg

相似文献

A One-Shot Lossless Algorithm for Cross-Cohort Learning in Mixed-Outcomes Analysis.一种用于混合结果分析中跨队列学习的一次性无损算法。

medRxiv. 2024 Dec 4:2024.01.09.24301073. doi: 10.1101/2024.01.09.24301073.

Short-Term Memory Impairment短期记忆障碍

Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。

Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

Is It Possible to Develop a Patient-reported Experience Measure With Lower Ceiling Effect?是否有可能开发一种天花板效应较低的患者报告体验测量方法？

Clin Orthop Relat Res. 2025 Apr 1;483(4):693-703. doi: 10.1097/CORR.0000000000003262. Epub 2024 Oct 25.

Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标：模型开发与评估研究

JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.

A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。

Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.

Sexual Harassment and Prevention Training性骚扰与预防培训

本文引用的文献

Secure and federated genome-wide association studies for biobank-scale datasets.针对生物样本库规模数据集的安全且联合的全基因组关联研究。

Nat Genet. 2025 Apr;57(4):809-814. doi: 10.1038/s41588-025-02109-1. Epub 2025 Feb 24.

Genome-wide large-scale multi-trait analysis characterizes global patterns of pleiotropy and unique trait-specific variants.全基因组大规模多性状分析描绘了遗传多效性和独特性状特异性变异的全球模式。

Nat Commun. 2024 Aug 14;15(1):6985. doi: 10.1038/s41467-024-51075-5.

FedGMMAT: Federated generalized linear mixed model association tests.FedGMMAT：联邦广义线性混合模型关联测试。

PLoS Comput Biol. 2024 Jul 24;20(7):e1012142. doi: 10.1371/journal.pcbi.1012142. eCollection 2024 Jul.

A novel method for multiple phenotype association studies based on genotype and phenotype network.基于基因型和表型网络的多种表型关联研究的新方法。

PLoS Genet. 2024 May 10;20(5):e1011245. doi: 10.1371/journal.pgen.1011245. eCollection 2024 May.

Demonstrating paths for unlocking the value of cloud genomics through cross cohort analysis.展示通过跨队列分析解锁云基因组学价值的途径。

Nat Commun. 2023 Sep 5;14(1):5419. doi: 10.1038/s41467-023-41185-x.

FinnGen provides genetic insights from a well-phenotyped isolated population.FinnGen 为一个表型良好的隔离人群提供了遗传学方面的见解。

Nature. 2023 Jan;613(7944):508-518. doi: 10.1038/s41586-022-05473-8. Epub 2023 Jan 18.

15 years of GWAS discovery: Realizing the promise.GWAS 发现 15 年：实现承诺。

Am J Hum Genet. 2023 Feb 2;110(2):179-194. doi: 10.1016/j.ajhg.2022.12.011. Epub 2023 Jan 11.

Leveraging pleiotropy to discover and interpret GWAS results for sleep-associated traits.利用多效性发现和解释与睡眠相关特征的 GWAS 结果。

PLoS Genet. 2022 Dec 27;18(12):e1010557. doi: 10.1371/journal.pgen.1010557. eCollection 2022 Dec.

Correlations between complex human phenotypes vary by genetic background, gender, and environment.复杂的人类表型之间的相关性因遗传背景、性别和环境而异。

Cell Rep Med. 2022 Dec 20;3(12):100844. doi: 10.1016/j.xcrm.2022.100844. Epub 2022 Dec 12.

Large-scale genomic analyses reveal insights into pleiotropy across circulatory system diseases and nervous system disorders.大规模基因组分析揭示了循环系统疾病和神经系统疾病中多效性的见解。

Nat Commun. 2022 Jun 14;13(1):3428. doi: 10.1038/s41467-022-30678-w.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于混合结果分析中跨队列学习的一次性无损算法。

A One-Shot Lossless Algorithm for Cross-Cohort Learning in Mixed-Outcomes Analysis.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献