University of California San Diego Health System Department of Biomedical Informatics, La Jolla, CA 92130, USA.
These authors contributed equally. Corresponding Author: Lucila Ohno-Machado, MD, MBA, PhD (
AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:355-364. eCollection 2021.
Federated learning of data from multiple participating parties is getting more attention and has many healthcare applications. We have previously developed VERTIGO, a distributed logistic regression model for vertically partitioned data. The model takes advantage of the linear separation property of kernel matrices of a dual space model to harmonize information in a privacy-preserving manner. However, this method does not handle the variance estimation and only provides point estimates: it cannot report test statistics and associated P-values. In this work, we extend VERTIGO by introducing a novel ring-structure protocol to pass on intermediary statistics among clients and successfully reconstructed the covariance matrix in the dual space. This extension, VERTIGO-CI, is a complete protocol to construct a logistic regression model from vertically partitioned datasets as if it is trained on combined data in a centralized setting. We evaluated our results on synthetic and real data, showing the equivalent accuracy and tolerable performance overhead compared to the centralized version. This novel extension can be applied to other types of generalized linear models that have dual objectives.
多方参与的联邦学习越来越受到关注,并在医疗保健领域有许多应用。我们之前开发了 VERTIGO,这是一种用于垂直分割数据的分布式逻辑回归模型。该模型利用对偶空间模型核矩阵的线性分离特性,以隐私保护的方式协调信息。然而,这种方法不处理方差估计,只提供点估计:它不能报告检验统计量和相关的 P 值。在这项工作中,我们通过引入一种新的环形结构协议来扩展 VERTIGO,该协议可以在客户端之间传递中间统计信息,并成功重建对偶空间中的协方差矩阵。这个扩展名为 VERTIGO-CI,它是一个完整的协议,可以从垂直分割的数据集中构建逻辑回归模型,就好像它是在集中设置下基于组合数据进行训练的。我们在合成数据和真实数据上评估了我们的结果,与集中版本相比,它具有相当的准确性和可接受的性能开销。这个新的扩展可以应用于具有对偶目标的其他类型的广义线性模型。