IEEE J Biomed Health Inform. 2021 Sep;25(9):3310-3320. doi: 10.1109/JBHI.2021.3071270. Epub 2021 Sep 3.
The Cox proportional hazards model is one of the most widely used methods for analyzing survival data. Data from multiple data providers are required to improve the generalizability and confidence of the results of Cox analysis; however, such data sharing may result in leakage of sensitive information, leading to financial fraud, social discrimination or unauthorized data abuse. Some privacy-preserving Cox regression protocols have been proposed in past years, but they lack either security or functionality. In this paper, we propose a privacy-preserving Cox regression protocol for multiple data providers and researchers. The proposed protocol allows researchers to train models on horizontally or vertically partitioned datasets while providing privacy protection for both the sensitive data and the trained models. Our protocol utilizes threshold homomorphic encryption to guarantee security. Experimental results demonstrate that with the proposed protocol, Cox regression model training over 9 variables in a dataset of 113,035 samples takes approximately 44 min, and the trained model is almost the same as that obtained with the original nonsecure Cox regression protocol; therefore, our protocol is a potential candidate for practical real-world applications in multicenter medical research.
Cox 比例风险模型是分析生存数据最广泛使用的方法之一。为了提高 Cox 分析结果的泛化性和置信度,需要来自多个数据提供者的数据;然而,这种数据共享可能会导致敏感信息泄露,从而导致金融欺诈、社会歧视或未经授权的数据滥用。过去几年已经提出了一些隐私保护的 Cox 回归协议,但它们要么缺乏安全性,要么缺乏功能性。在本文中,我们为多个数据提供者和研究人员提出了一种隐私保护的 Cox 回归协议。所提出的协议允许研究人员在水平或垂直分割的数据上训练模型,同时为敏感数据和训练的模型提供隐私保护。我们的协议利用门限同态加密来保证安全性。实验结果表明,使用所提出的协议,在一个包含 113035 个样本的数据集上对 9 个变量进行 Cox 回归模型训练大约需要 44 分钟,并且所训练的模型几乎与原始的非安全 Cox 回归协议相同;因此,我们的协议是在多中心医学研究中实际应用的潜在候选方案。