Suppr超能文献

在保护隐私的同时,对垂直分区数据进行 Cox 比例风险模型的精确训练。

Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy.

机构信息

Cyber Security and Robustness, Netherlands Organisation for Applied Scientific Research, The Hague, The Netherlands.

Cryptology, Centrum Wiskunde and Informatica, Amsterdam, The Netherlands.

出版信息

BMC Med Inform Decis Mak. 2022 Feb 24;22(1):49. doi: 10.1186/s12911-022-01771-3.

Abstract

BACKGROUND

Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of people. Technically more challenging is the case of vertically-partitioned data, dealing with data on overlapping sets of people. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. Both MPC and CPH are explained.

METHODS

We use a Newton-Raphson solver to securely train the CPH model with MPC, jointly with all data holders, without revealing any sensitive data. In order to securely compute the log-partial likelihood in each iteration, we run into several technical challenges to preserve the efficiency and security of our solution. To tackle these technical challenges, we generalize a cryptographic protocol for securely computing the inverse of the Hessian matrix and develop a new method for securely computing exponentiations. A theoretical complexity estimate is given to get insight into the computational and communication effort that is needed.

RESULTS

Our secure solution is implemented in a setting with three different machines, each presenting a different data holder, which can communicate through the internet. The MPyC platform is used for implementing this privacy-preserving solution to obtain the CPH model. We test the accuracy and computation time of our methods on three standard benchmark survival datasets. We identify future work to make our solution more efficient.

CONCLUSIONS

Our secure solution is comparable with the standard, non-secure solver in terms of accuracy and convergence speed. The computation time is considerably larger, although the theoretical complexity is still cubic in the number of covariates and quadratic in the number of subjects. We conclude that this is a promising way of performing parametric survival analysis on vertically-distributed medical data, while realising high level of security and privacy.

摘要

背景

分析分布式医疗数据具有挑战性,因为数据的敏感性和访问及合并数据的各种法规。一些隐私保护方法可用于分析水平分割的数据,其中不同的组织在不相交的人群数据集上具有相似的数据。在垂直分割的数据情况下,技术上更具挑战性,涉及到重叠人群数据集的数据。我们使用一种新兴的基于密码技术的技术,称为安全多方计算(MPC),并应用它通过 Cox 比例风险(CPH)模型对垂直分布的数据执行隐私保护生存分析。同时解释了 MPC 和 CPH。

方法

我们使用牛顿-拉普森求解器通过 MPC 安全地训练 CPH 模型,同时与所有数据持有者一起,而不泄露任何敏感数据。为了在每次迭代中安全地计算对数部分似然,我们遇到了一些技术挑战,以保持我们解决方案的效率和安全性。为了解决这些技术挑战,我们将一个密码协议泛化用于安全地计算 Hessian 矩阵的逆,并开发一种新的方法用于安全地计算指数。给出了理论复杂度估计,以深入了解所需的计算和通信工作量。

结果

我们的安全解决方案在具有三个不同机器的设置中实现,每个机器代表一个不同的数据持有者,可以通过互联网进行通信。MPyC 平台用于实现这种隐私保护解决方案以获得 CPH 模型。我们在三个标准基准生存数据集上测试我们方法的准确性和计算时间。我们确定了未来的工作,以使我们的解决方案更有效。

结论

我们的安全解决方案在准确性和收敛速度方面可与标准的非安全求解器相媲美。计算时间明显更长,尽管理论复杂度仍然是协变量数量的立方和主题数量的二次方。我们得出的结论是,这是一种在垂直分布的医疗数据上执行参数生存分析的有前途的方法,同时实现了高水平的安全性和隐私性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe07/8867891/60db35a6ef1e/12911_2022_1771_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验