Suppr超能文献

隐私保护的逻辑回归训练。

Privacy-preserving logistic regression training.

机构信息

imec-Cosic, Dept. Electrical Engineering, KU Leuven, Kasteelpark Arenberg 10, Leuven, Belgium.

出版信息

BMC Med Genomics. 2018 Oct 11;11(Suppl 4):86. doi: 10.1186/s12920-018-0398-y.

Abstract

BACKGROUND

Logistic regression is a popular technique used in machine learning to construct classification models. Since the construction of such models is based on computing with large datasets, it is an appealing idea to outsource this computation to a cloud service. The privacy-sensitive nature of the input data requires appropriate privacy preserving measures before outsourcing it. Homomorphic encryption enables one to compute on encrypted data directly, without decryption and can be used to mitigate the privacy concerns raised by using a cloud service.

METHODS

In this paper, we propose an algorithm (and its implementation) to train a logistic regression model on a homomorphically encrypted dataset. The core of our algorithm consists of a new iterative method that can be seen as a simplified form of the fixed Hessian method, but with a much lower multiplicative complexity.

RESULTS

We test the new method on two interesting real life applications: the first application is in medicine and constructs a model to predict the probability for a patient to have cancer, given genomic data as input; the second application is in finance and the model predicts the probability of a credit card transaction to be fraudulent. The method produces accurate results for both applications, comparable to running standard algorithms on plaintext data.

CONCLUSIONS

This article introduces a new simple iterative algorithm to train a logistic regression model that is tailored to be applied on a homomorphically encrypted dataset. This algorithm can be used as a privacy-preserving technique to build a binary classification model and can be applied in a wide range of problems that can be modelled with logistic regression. Our implementation results show that our method can handle the large datasets used in logistic regression training.

摘要

背景

逻辑回归是机器学习中用于构建分类模型的一种流行技术。由于此类模型的构建是基于对大数据集的计算,因此将计算外包给云服务是一个很有吸引力的想法。输入数据的隐私敏感性要求在将其外包之前采取适当的隐私保护措施。同态加密允许直接对加密数据进行计算,而无需解密,可以用于减轻使用云服务引起的隐私问题。

方法

在本文中,我们提出了一种在同态加密数据集上训练逻辑回归模型的算法(及其实现)。我们算法的核心是一种新的迭代方法,可以看作是固定 Hessian 方法的简化形式,但乘法复杂度要低得多。

结果

我们在两个有趣的实际应用中测试了新方法:第一个应用是医学,构建了一个模型,用于根据基因组数据预测患者患癌症的概率;第二个应用是金融,该模型预测信用卡交易欺诈的概率。该方法在这两个应用中都产生了准确的结果,与在明文数据上运行标准算法相当。

结论

本文介绍了一种新的简单迭代算法,用于训练适用于同态加密数据集的逻辑回归模型。该算法可作为一种隐私保护技术,用于构建二进制分类模型,并可应用于可通过逻辑回归建模的广泛问题。我们的实现结果表明,我们的方法可以处理逻辑回归训练中使用的大型数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c9b/6180357/600d595c836d/12920_2018_398_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验