Suppr超能文献

垂直网格逻辑回归(VERTIGO)。

VERTIcal Grid lOgistic regression (VERTIGO).

作者信息

Li Yong, Jiang Xiaoqian, Wang Shuang, Xiong Hongkai, Ohno-Machado Lucila

机构信息

EE Department, Shanghai Jiaotong University, Shanghai, China, 200240.

Department of Biomedical Informatics, UC San Diego, La Jolla, California, USA

出版信息

J Am Med Inform Assoc. 2016 May;23(3):570-9. doi: 10.1093/jamia/ocv146. Epub 2015 Nov 9.

Abstract

OBJECTIVE

To develop an accurate logistic regression (LR) algorithm to support federated data analysis of vertically partitioned distributed data sets.

MATERIAL AND METHODS

We propose a novel technique that solves the binary LR problem by dual optimization to obtain a global solution for vertically partitioned data. We evaluated this new method, VERTIcal Grid lOgistic regression (VERTIGO), in artificial and real-world medical classification problems in terms of the area under the receiver operating characteristic curve, calibration, and computational complexity. We assumed that the institutions could "align" patient records (through patient identifiers or hashed "privacy-protecting" identifiers), and also that they both had access to the values for the dependent variable in the LR model (eg, that if the model predicts death, both institutions would have the same information about death).

RESULTS

The solution derived by VERTIGO has the same estimated parameters as the solution derived by applying classical LR. The same is true for discrimination and calibration over both simulated and real data sets. In addition, the computational cost of VERTIGO is not prohibitive in practice.

DISCUSSION

There is a technical challenge in scaling up federated LR for vertically partitioned data. When the number of patients m is large, our algorithm has to invert a large Hessian matrix. This is an expensive operation of time complexity O(m(3)) that may require large amounts of memory for storage and exchange of information. The algorithm may also not work well when the number of observations in each class is highly imbalanced.

CONCLUSION

The proposed VERTIGO algorithm can generate accurate global models to support federated data analysis of vertically partitioned data.

摘要

目的

开发一种精确的逻辑回归(LR)算法,以支持对垂直划分的分布式数据集进行联合数据分析。

材料与方法

我们提出了一种新技术,通过对偶优化解决二元LR问题,以获得垂直划分数据的全局解。我们在人工和现实世界的医学分类问题中,从受试者工作特征曲线下面积、校准和计算复杂度方面评估了这种新方法——垂直网格逻辑回归(VERTIGO)。我们假设各机构可以“对齐”患者记录(通过患者标识符或哈希后的“隐私保护”标识符),并且它们都可以访问LR模型中因变量的值(例如,如果模型预测死亡,两个机构将拥有关于死亡的相同信息)。

结果

VERTIGO得出的解与应用经典LR得出的解具有相同的估计参数。在模拟和真实数据集上的区分度和校准方面也是如此。此外,VERTIGO的计算成本在实际中并非过高。

讨论

在扩大对垂直划分数据的联合LR规模方面存在技术挑战。当患者数量m很大时,我们的算法必须求逆一个大的海森矩阵。这是一个时间复杂度为O(m(3))的昂贵操作,可能需要大量内存来存储和交换信息。当每个类中的观测数量高度不平衡时,该算法可能也无法很好地工作。

结论

所提出的VERTIGO算法可以生成精确的全局模型,以支持对垂直划分数据的联合数据分析。

相似文献

1
VERTIcal Grid lOgistic regression (VERTIGO).
J Am Med Inform Assoc. 2016 May;23(3):570-9. doi: 10.1093/jamia/ocv146. Epub 2015 Nov 9.
2
VERTIcal Grid lOgistic regression with Confidence Intervals (VERTIGO-CI).
AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:355-364. eCollection 2021.
4
Federated learning of predictive models from federated Electronic Health Records.
Int J Med Inform. 2018 Apr;112:59-67. doi: 10.1016/j.ijmedinf.2018.01.007. Epub 2018 Jan 12.
5
Grid Binary LOgistic REgression (GLORE): building shared models without sharing data.
J Am Med Inform Assoc. 2012 Sep-Oct;19(5):758-64. doi: 10.1136/amiajnl-2012-000862. Epub 2012 Apr 17.
6
VERTICOX: Vertically Distributed Cox Proportional Hazards Model Using the Alternating Direction Method of Multipliers.
IEEE Trans Knowl Data Eng. 2022 Feb;34(2):996-1010. doi: 10.1109/tkde.2020.2989301. Epub 2020 Apr 22.
7
Distributed non-disclosive validation of predictive models by a modified ROC-GLM.
BMC Med Res Methodol. 2024 Aug 29;24(1):190. doi: 10.1186/s12874-024-02312-4.
9
Classification of EEG signals using neural network and logistic regression.
Comput Methods Programs Biomed. 2005 May;78(2):87-99. doi: 10.1016/j.cmpb.2004.10.009.
10
Calibrating predictive model estimates in a distributed network of patient data.
J Biomed Inform. 2021 May;117:103758. doi: 10.1016/j.jbi.2021.103758. Epub 2021 Apr 1.

引用本文的文献

2
COLLAGENE enables privacy-aware federated and collaborative genomic data analysis.
Genome Biol. 2023 Sep 11;24(1):204. doi: 10.1186/s13059-023-03039-z.
3
Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics.
Philos Trans A Math Phys Eng Sci. 2023 May 15;381(2247):20220156. doi: 10.1098/rsta.2022.0156. Epub 2023 Mar 27.
5
VERTICOX: Vertically Distributed Cox Proportional Hazards Model Using the Alternating Direction Method of Multipliers.
IEEE Trans Knowl Data Eng. 2022 Feb;34(2):996-1010. doi: 10.1109/tkde.2020.2989301. Epub 2020 Apr 22.
7
Selecting Privacy-Enhancing Technologies for Managing Health Data Use.
Front Public Health. 2022 Mar 16;10:814163. doi: 10.3389/fpubh.2022.814163. eCollection 2022.
8
Privacy-Preserving Artificial Intelligence Techniques in Biomedicine.
Methods Inf Med. 2022 Jun;61(S 01):e12-e27. doi: 10.1055/s-0041-1740630. Epub 2022 Jan 21.
9
Systematic Mapping Study of AI/Machine Learning in Healthcare and Future Directions.
SN Comput Sci. 2021;2(6):461. doi: 10.1007/s42979-021-00848-6. Epub 2021 Sep 16.
10
VERTIcal Grid lOgistic regression with Confidence Intervals (VERTIGO-CI).
AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:355-364. eCollection 2021.

本文引用的文献

1
pSCANNER: patient-centered Scalable National Network for Effectiveness Research.
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):621-6. doi: 10.1136/amiajnl-2014-002751. Epub 2014 Apr 29.
2
Grid Binary LOgistic REgression (GLORE): building shared models without sharing data.
J Am Med Inform Assoc. 2012 Sep-Oct;19(5):758-64. doi: 10.1136/amiajnl-2012-000862. Epub 2012 Apr 17.
4
Multiparameter Intelligent Monitoring in Intensive Care II: a public-access intensive care unit database.
Crit Care Med. 2011 May;39(5):952-60. doi: 10.1097/CCM.0b013e31820a92c6.
5
A comparison of goodness-of-fit tests for the logistic regression model.
Stat Med. 1997 May 15;16(9):965-80. doi: 10.1002/(sici)1097-0258(19970515)16:9<965::aid-sim509>3.0.co;2-o.
7
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
Radiology. 1982 Apr;143(1):29-36. doi: 10.1148/radiology.143.1.7063747.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验