Ngo Hieu, Fang Hua, Rumbut Joshua, Wang Honggang
College of Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747.
Department of Computer and Information Science, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747 and the Department of Population and Quantitative Health Science, University of Massachusetts Chan Medical School, Worcester, MA 01655 USA.
IEEE Internet Things J. 2024 Apr 15;11(8):14657-14670. doi: 10.1109/jiot.2023.3343719. Epub 2023 Dec 18.
The use of medical data for machine learning, including unsupervised methods such as clustering, is often restricted by privacy regulations such as the Health Insurance Portability and Accountability Act (HIPAA). Medical data is sensitive and highly regulated and anonymization is often insufficient to protect a patient's identity. Traditional clustering algorithms are also unsuitable for longitudinal behavioral health trials, which often have missing data and observe individual behaviors over varying time periods. In this work, we develop a new decentralized federated multiple imputation-based fuzzy clustering algorithm for complex longitudinal behavioral trial data collected from multisite randomized controlled trials over different time periods. Federated learning (FL) preserves privacy by aggregating model parameters instead of data. Unlike previous FL methods, this proposed algorithm requires only two rounds of communication and handles clients with varying numbers of time points for incomplete longitudinal data. The model is evaluated on both empirical longitudinal dietary health data and simulated clusters with different numbers of clients, effect sizes, correlations, and sample sizes. The proposed algorithm converges rapidly and achieves desirable performance on multiple clustering metrics. This new method allows for targeted treatments for various patient groups while preserving their data privacy and enables the potential for broader applications in the Internet of Medical Things.
将医学数据用于机器学习,包括诸如聚类等无监督方法,通常受到隐私法规的限制,如《健康保险流通与责任法案》(HIPAA)。医学数据敏感且受到严格监管,匿名化往往不足以保护患者身份。传统的聚类算法也不适用于纵向行为健康试验,这类试验常常存在数据缺失的情况,并且要在不同时间段观察个体行为。在这项工作中,我们针对从不同时间段的多中心随机对照试验收集的复杂纵向行为试验数据,开发了一种新的基于分散式联邦多重插补的模糊聚类算法。联邦学习(FL)通过聚合模型参数而非数据来保护隐私。与先前的FL方法不同,该算法仅需两轮通信,并且能处理具有不同时间点数的不完整纵向数据的客户端。该模型在经验性纵向饮食健康数据以及具有不同客户端数量、效应大小、相关性和样本大小的模拟聚类上进行了评估。所提出的算法收敛迅速,并在多个聚类指标上取得了理想的性能。这种新方法在保护患者数据隐私的同时,允许针对不同患者群体进行有针对性的治疗,并为在医疗物联网中更广泛的应用创造了潜力。