Suppr超能文献

用于协作式健康数据分析的通信高效联邦广义张量分解

Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics.

作者信息

Ma Jing, Zhang Qiuchen, Lou Jian, Xiong Li, Ho Joyce C

机构信息

Emory University.

出版信息

Proc Int World Wide Web Conf. 2021 Apr;2021:171-182. doi: 10.1145/3442381.3449832.

Abstract

Modern healthcare systems knitted by a web of entities (e.g., hospitals, clinics, pharmacy companies) are collecting a huge volume of healthcare data from a large number of individuals with various medical procedures, medications, diagnosis, and lab tests. To extract meaningful medical concepts (i.e., phenotypes) from such higher-arity relational healthcare data, tensor factorization has been proven to be an effective approach and received increasing research attention, due to their intrinsic capability to represent the high-dimensional data. Recently, federated learning offers a privacy-preserving paradigm for collaborative learning among different entities, which seemingly provides an ideal potential to further enhance the tensor factorization-based collaborative phenotyping to handle sensitive personal health data. However, existing attempts to federated tensor factorization come with various limitations, including restrictions to the classic tensor factorization, high communication cost and reduced accuracy. We propose a federated tensor factorization, which is flexible enough to choose from a variate of losses to best suit different types of data in practice. We design a three-level communication reduction strategy tailored to the generalized tensor factorization, which is able to reduce the uplink communication cost up to 99.90%. In addition, we theoretically prove that our algorithm does not compromise convergence speed despite the aggressive communication compression. Extensive experiments on two real-world electronics health record datasets demonstrate the efficiency improvements in terms of computation and communication cost.

摘要

由众多实体(如医院、诊所、制药公司)构成的现代医疗系统正在从大量个体那里收集海量的医疗数据,这些数据涵盖了各种医疗程序、药物治疗、诊断以及实验室检测。为了从这种高维关系型医疗数据中提取有意义的医学概念(即表型),张量分解已被证明是一种有效的方法,并受到了越来越多的研究关注,这是因为它具有表示高维数据的内在能力。最近,联邦学习为不同实体之间的协作学习提供了一种隐私保护范式,这似乎为进一步增强基于张量分解的协作表型分析以处理敏感的个人健康数据提供了理想的潜力。然而,现有的联邦张量分解尝试存在各种局限性,包括对经典张量分解的限制、高通信成本以及准确性降低。我们提出了一种联邦张量分解方法,它足够灵活,可以从各种损失函数中进行选择,以在实践中最适合不同类型的数据。我们针对广义张量分解设计了一种三级通信减少策略,该策略能够将上行通信成本降低高达99.90%。此外,我们从理论上证明,尽管进行了激进的通信压缩,我们的算法也不会影响收敛速度。在两个真实世界的电子健康记录数据集上进行的大量实验证明了在计算和通信成本方面的效率提升。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739c/8404412/7b7dd0904870/nihms-1722423-f0008.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验