Suppr超能文献

分布式协调:联合聚类批次效应调整与泛化

Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization.

作者信息

Hoang Bao, Pang Yijiang, Liang Siqi, Zhan Liang, Thompson Paul M, Zhou Jiayu

机构信息

Michigan State University, East Lansing, Michigan, USA.

University of Pittsburgh, Pittsburgh, Pennsylvania, USA.

出版信息

KDD. 2024;2024:5105-5115. doi: 10.1145/3637528.3671590. Epub 2024 Aug 24.

Abstract

Independent and identically distributed () data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or facilities, thereby violating the rule. A common strategy is to harmonize the site bias while retaining important biological information. The COMBAT is among the most popular harmonization approaches and has recently been extended to handle distributed sites. However, when faced with situations involving newly joined sites in training or evaluating data from unknown/unseen sites, COMBAT lacks compatibility and requires retraining with data from all the sites. The retraining leads to significant computational and logistic overhead that is usually prohibitive. In this work, we develop a novel harmonization algorithm, which leverages cluster patterns of the data in different sites and greatly advances the usability of COMBAT harmonization. We use extensive simulation and real medical imaging data from ADNI to demonstrate the superiority of the proposed approach. Our codes are provided in https://github.com/illidanlab/distributed-cluster-harmonization.

摘要

独立同分布()数据对于许多数据分析和建模技术至关重要。在医学领域,从多个地点或机构收集数据是一种常见策略,这由医学数据的分散性质决定,可确保足够的临床多样性。然而,来自不同地点的数据很容易受到当地环境或设施的影响而产生偏差,从而违反独立同分布规则。一种常见策略是在保留重要生物学信息的同时协调地点偏差。COMBAT是最流行的协调方法之一,最近已扩展到处理分布式地点。然而,当面对训练中涉及新加入地点或评估来自未知/未见地点的数据的情况时,COMBAT缺乏兼容性,需要使用所有地点的数据进行重新训练。重新训练会导致巨大的计算和后勤开销,通常令人望而却步。在这项工作中,我们开发了一种新颖的协调算法,该算法利用不同地点数据的聚类模式,极大地提高了COMBAT协调的可用性。我们使用来自ADNI的大量模拟和真实医学成像数据来证明所提出方法的优越性。我们的代码可在https://github.com/illidanlab/distributed-cluster-harmonization上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cb7/11529347/8af6286e3503/nihms-2029086-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验