Academy of Arts and Design, Tsinghua University, Beijing, China.
The Future Laboratory, Tsinghua University, Beijing, China.
PLoS One. 2020 Apr 17;15(4):e0230706. doi: 10.1371/journal.pone.0230706. eCollection 2020.
Intensive care data are valuable for improvement of health care, policy making and many other purposes. Vast amount of such data are stored in different locations, on many different devices and in different data silos. Sharing data among different sources is a big challenge due to regulatory, operational and security reasons. One potential solution is federated machine learning, which is a method that sends machine learning algorithms simultaneously to all data sources, trains models in each source and aggregates the learned models. This strategy allows utilization of valuable data without moving them. One challenge in applying federated machine learning is the possibly different distributions of data from diverse sources. To tackle this problem, we proposed an adaptive boosting method named LoAdaBoost that increases the efficiency of federated machine learning. Using intensive care unit data from hospitals, we investigated the performance of learning in IID and non-IID data distribution scenarios, and showed that the proposed LoAdaBoost method achieved higher predictive accuracy with lower computational complexity than the baseline method.
重症监护数据对于改善医疗保健、制定政策和许多其他目的都非常有价值。大量此类数据存储在不同的位置,使用许多不同的设备和不同的数据孤岛中。由于监管、运营和安全原因,在不同来源之间共享数据是一项巨大的挑战。一种潜在的解决方案是联邦机器学习,这是一种将机器学习算法同时发送到所有数据源、在每个源中训练模型并聚合学习模型的方法。这种策略允许在不移动数据的情况下利用有价值的数据。在应用联邦机器学习时面临的一个挑战是来自不同来源的数据可能具有不同的分布。为了解决这个问题,我们提出了一种名为 LoAdaBoost 的自适应提升方法,该方法提高了联邦机器学习的效率。我们使用来自医院的重症监护单元数据,研究了在IID 和非 IID 数据分布场景下的学习性能,并表明与基线方法相比,所提出的 LoAdaBoost 方法在具有更低计算复杂度的情况下实现了更高的预测准确性。