Dankar Fida K, Madathil Nisha, Dankar Samar K, Boughorbel Sabri
United Arab Emirates University, Abu Dhabi, United Arab Emirates.
Independent Scientist, Ottawa, ON, Canada.
JMIR Med Inform. 2019 Apr 29;7(2):e12702. doi: 10.2196/12702.
Biomedical research often requires large cohorts and necessitates the sharing of biomedical data with researchers around the world, which raises many privacy, ethical, and legal concerns. In the face of these concerns, privacy experts are trying to explore approaches to analyzing the distributed data while protecting its privacy. Many of these approaches are based on secure multiparty computations (SMCs). SMC is an attractive approach allowing multiple parties to collectively carry out calculations on their datasets without having to reveal their own raw data; however, it incurs heavy computation time and requires extensive communication between the involved parties.
This study aimed to develop usable and efficient SMC applications that meet the needs of the potential end-users and to raise general awareness about SMC as a tool that supports data sharing.
We have introduced distributed statistical computing (DSC) into the design of secure multiparty protocols, which allows us to conduct computations on each of the parties' sites independently and then combine these computations to form 1 estimator for the collective dataset, thus limiting communication to the final step and reducing complexity. The effectiveness of our privacy-preserving model is demonstrated through a linear regression application.
Our secure linear regression algorithm was tested for accuracy and performance using real and synthetic datasets. The results showed no loss of accuracy (over nonsecure regression) and very good performance (20 min for 100 million records).
We used DSC to securely calculate a linear regression model over multiple datasets. Our experiments showed very good performance (in terms of the number of records it can handle). We plan to extend our method to other estimators such as logistic regression.
生物医学研究通常需要大量队列,并需要与全球的研究人员共享生物医学数据,这引发了许多隐私、伦理和法律方面的担忧。面对这些担忧,隐私专家试图探索在保护隐私的同时分析分布式数据的方法。其中许多方法基于安全多方计算(SMC)。SMC是一种有吸引力的方法,它允许多方对其数据集进行集体计算,而无需透露自己的原始数据;然而,它会带来大量的计算时间,并且需要参与方之间进行广泛的通信。
本研究旨在开发满足潜在终端用户需求的可用且高效的SMC应用程序,并提高人们对SMC作为支持数据共享工具的普遍认识。
我们将分布式统计计算(DSC)引入到安全多方协议的设计中,这使我们能够在各方站点上独立进行计算,然后将这些计算结果合并以形成针对集体数据集的1个估计量,从而将通信限制在最后一步并降低复杂性。我们通过线性回归应用展示了隐私保护模型的有效性。
我们的安全线性回归算法使用真实和合成数据集进行了准确性和性能测试。结果表明,(与非安全回归相比)准确性没有损失,并且性能非常好(处理1亿条记录需要20分钟)。
我们使用DSC在多个数据集上安全地计算线性回归模型。我们的实验显示了非常好的性能(就其能够处理的记录数量而言)。我们计划将我们的方法扩展到其他估计量,如逻辑回归。