Suppr超能文献

医学大数据匿名化对早期急性肾损伤风险预测的影响

The Impact of Medical Big Data Anonymization on Early Acute Kidney Injury Risk Prediction.

作者信息

Song Xing, Waitman Lemuel R, Hu Yong, Luo Bo, Li Fengjun, Liu Mei

机构信息

University of Kansas Medical Center, Department of Internal Medicine, Division of Medical Informatics, Kansas City, KS, USA.

Jinan University, Big Data Decision Institute, Guangzhou, PRC.

出版信息

AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:617-625. eCollection 2020.

Abstract

Artificial intelligence enabled medical big data analysis has the potential to revolutionize medical practice from diagnosis and prediction of complex diseases to making recommendations and resource allocation decisions in an evidence-based manner. However, big data comes with big disclosure risks. To preserve privacy, excessive data anonymization is often necessary, leading to significant loss of data utility. In this paper, we develop a systematic data scrubbing procedure for large datasets when key variables are uncertain for re-identification risk assessment and assess the trade-off between anonymization of electronic health record data for sharing in support of open science and performance of machine learning models for early acute kidney injury risk prediction using the data. Results demonstrate that our proposed data scrubbing procedure can maintain good feature diversity and moderate data utility but raises concerns regarding its impact on knowledge discovery capability.

摘要

人工智能驱动的医学大数据分析有潜力彻底改变医疗实践,从复杂疾病的诊断和预测到以循证方式做出推荐和资源分配决策。然而,大数据伴随着巨大的披露风险。为了保护隐私,往往需要进行过度的数据匿名化处理,这会导致数据效用的显著损失。在本文中,当关键变量对于重新识别风险评估不确定时,我们为大型数据集开发了一种系统的数据清理程序,并评估了用于支持开放科学而共享的电子健康记录数据匿名化与使用这些数据进行早期急性肾损伤风险预测的机器学习模型性能之间的权衡。结果表明,我们提出的数据清理程序可以保持良好的特征多样性和适度的数据效用,但引发了对其对知识发现能力影响的担忧。

相似文献

4
Utility-preserving anonymization for health data publishing.用于健康数据发布的效用保持匿名化
BMC Med Inform Decis Mak. 2017 Jul 11;17(1):104. doi: 10.1186/s12911-017-0499-0.
7
Privacy-enhancing ETL-processes for biomedical data.用于生物医学数据的隐私增强型 ETL 流程。
Int J Med Inform. 2019 Jun;126:72-81. doi: 10.1016/j.ijmedinf.2019.03.006. Epub 2019 Mar 23.

本文引用的文献

1
Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing.隐私保护生成式深度神经网络支持临床数据共享。
Circ Cardiovasc Qual Outcomes. 2019 Jul;12(7):e005122. doi: 10.1161/CIRCOUTCOMES.118.005122. Epub 2019 Jul 9.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验