• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用缺失值模式对电子健康记录机器学习模型进行后门攻击:开发与验证研究

Exploiting Missing Value Patterns for a Backdoor Attack on Machine Learning Models of Electronic Health Records: Development and Validation Study.

作者信息

Joe Byunggill, Park Yonghyeon, Hamm Jihun, Shin Insik, Lee Jiyeon

机构信息

School of Computing, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea.

An affiliated institute of Electronics and Telecommunications Research Institute, Daejeon, Republic of Korea.

出版信息

JMIR Med Inform. 2022 Aug 19;10(8):e38440. doi: 10.2196/38440.

DOI:10.2196/38440
PMID:35984701
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9440413/
Abstract

BACKGROUND

A backdoor attack controls the output of a machine learning model in 2 stages. First, the attacker poisons the training data set, introducing a back door into the victim's trained model. Second, during test time, the attacker adds an imperceptible pattern called a trigger to the input values, which forces the victim's model to output the attacker's intended values instead of true predictions or decisions. While backdoor attacks pose a serious threat to the reliability of machine learning-based medical diagnostics, existing backdoor attacks that directly change the input values are detectable relatively easily.

OBJECTIVE

The goal of this study was to propose and study a robust backdoor attack on mortality-prediction machine learning models that use electronic health records. We showed that our backdoor attack grants attackers full control over classification outcomes for safety-critical tasks such as mortality prediction, highlighting the importance of undertaking safe artificial intelligence research in the medical field.

METHODS

We present a trigger generation method based on missing patterns in electronic health record data. Compared to existing approaches, which introduce noise into the medical record, the proposed backdoor attack makes it simple to construct backdoor triggers without prior knowledge. To effectively avoid detection by manual inspectors, we employ variational autoencoders to learn the missing patterns in normal electronic health record data and produce trigger data that appears similar to this data.

RESULTS

We experimented with the proposed backdoor attack on 4 machine learning models (linear regression, multilayer perceptron, long short-term memory, and gated recurrent units) that predict in-hospital mortality using a public electronic health record data set. The results showed that the proposed technique achieved a significant drop in the victim's discrimination performance (reducing the area under the precision-recall curve by at most 0.45), with a low poisoning rate (2%) in the training data set. In addition, the impact of the attack on general classification performance was negligible (it reduced the area under the precision-recall curve by an average of 0.01025), which makes it difficult to detect the presence of poison.

CONCLUSIONS

To the best of our knowledge, this is the first study to propose a backdoor attack that uses missing information from tabular data as a trigger. Through extensive experiments, we demonstrated that our backdoor attack can inflict severe damage on medical machine learning classifiers in practice.

摘要

背景

后门攻击分两个阶段控制机器学习模型的输出。首先,攻击者污染训练数据集,在受害者的训练模型中引入后门。其次,在测试阶段,攻击者向输入值添加一个称为触发器的不可察觉模式,这会迫使受害者的模型输出攻击者预期的值,而不是真实的预测或决策。虽然后门攻击对基于机器学习的医学诊断的可靠性构成严重威胁,但现有的直接改变输入值的后门攻击相对容易被检测到。

目的

本研究的目标是针对使用电子健康记录的死亡率预测机器学习模型提出并研究一种强大的后门攻击。我们表明,我们的后门攻击使攻击者能够完全控制诸如死亡率预测等安全关键任务的分类结果,凸显了在医学领域进行安全人工智能研究的重要性。

方法

我们提出了一种基于电子健康记录数据中缺失模式的触发器生成方法。与现有的将噪声引入病历的方法相比,所提出的后门攻击无需先验知识即可轻松构建后门触发器。为了有效避免人工检查人员的检测,我们使用变分自编码器来学习正常电子健康记录数据中的缺失模式,并生成与该数据相似的触发数据。

结果

我们使用一个公共电子健康记录数据集,对4种预测住院死亡率的机器学习模型(线性回归、多层感知器、长短期记忆和门控循环单元)进行了所提出的后门攻击实验。结果表明,所提出的技术使受害者的判别性能显著下降(精确率-召回率曲线下面积最多降低0.45),而训练数据集中的中毒率较低(2%)。此外,攻击对一般分类性能的影响可以忽略不计(精确率-召回率曲线下面积平均降低0.01025),这使得很难检测到中毒的存在。

结论

据我们所知,这是第一项提出使用表格数据中的缺失信息作为触发器的后门攻击的研究。通过广泛的实验,我们证明了我们的后门攻击在实践中会对医学机器学习分类器造成严重损害。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/9c8e8cad6f93/medinform_v10i8e38440_fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/5f66a4a6e532/medinform_v10i8e38440_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/5b3ec956d726/medinform_v10i8e38440_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/f056769090c6/medinform_v10i8e38440_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/fdf3bf4c7791/medinform_v10i8e38440_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/ca5e48adf762/medinform_v10i8e38440_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/b5eb7a2ba584/medinform_v10i8e38440_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/51b043cc7e8d/medinform_v10i8e38440_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/edff2319fc95/medinform_v10i8e38440_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/e5fdbd01c161/medinform_v10i8e38440_fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/1270b82aa9af/medinform_v10i8e38440_fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/9c8e8cad6f93/medinform_v10i8e38440_fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/5f66a4a6e532/medinform_v10i8e38440_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/5b3ec956d726/medinform_v10i8e38440_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/f056769090c6/medinform_v10i8e38440_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/fdf3bf4c7791/medinform_v10i8e38440_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/ca5e48adf762/medinform_v10i8e38440_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/b5eb7a2ba584/medinform_v10i8e38440_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/51b043cc7e8d/medinform_v10i8e38440_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/edff2319fc95/medinform_v10i8e38440_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/e5fdbd01c161/medinform_v10i8e38440_fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/1270b82aa9af/medinform_v10i8e38440_fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf2d/9440413/9c8e8cad6f93/medinform_v10i8e38440_fig11.jpg

相似文献

1
Exploiting Missing Value Patterns for a Backdoor Attack on Machine Learning Models of Electronic Health Records: Development and Validation Study.利用缺失值模式对电子健康记录机器学习模型进行后门攻击:开发与验证研究
JMIR Med Inform. 2022 Aug 19;10(8):e38440. doi: 10.2196/38440.
2
Poison Ink: Robust and Invisible Backdoor Attack.毒墨:稳健且不可见的后门攻击
IEEE Trans Image Process. 2022;31:5691-5705. doi: 10.1109/TIP.2022.3201472. Epub 2022 Sep 2.
3
Detection of Backdoors in Trained Classifiers Without Access to the Training Set.在无法访问训练集的情况下检测训练分类器中的后门。
IEEE Trans Neural Netw Learn Syst. 2022 Mar;33(3):1177-1191. doi: 10.1109/TNNLS.2020.3041202. Epub 2022 Feb 28.
4
LSTM-Based Prediction Model for Tuberculosis Among HIV-Infected Patients Using Structured Electronic Medical Records: A Retrospective Machine Learning Study.基于长短期记忆网络的使用结构化电子病历预测艾滋病毒感染患者结核病的模型:一项回顾性机器学习研究
J Multidiscip Healthc. 2024 Jul 23;17:3557-3573. doi: 10.2147/JMDH.S467877. eCollection 2024.
5
Backdoor Attack against Face Sketch Synthesis.针对面部草图合成的后门攻击。
Entropy (Basel). 2023 Jun 25;25(7):974. doi: 10.3390/e25070974.
6
Federated Learning Backdoor Attack Based on Frequency Domain Injection.基于频域注入的联邦学习后门攻击
Entropy (Basel). 2024 Feb 14;26(2):164. doi: 10.3390/e26020164.
7
Backdoor attacks on unsupervised graph representation learning.后门攻击对无监督图表示学习的影响。
Neural Netw. 2024 Dec;180:106668. doi: 10.1016/j.neunet.2024.106668. Epub 2024 Aug 29.
8
How to backdoor split learning.后门分裂学习。
Neural Netw. 2023 Nov;168:326-336. doi: 10.1016/j.neunet.2023.09.037. Epub 2023 Sep 24.
9
Backdoor attack and defense in federated generative adversarial network-based medical image synthesis.联邦生成对抗网络的后门攻击与防御在医学图像合成中的应用。
Med Image Anal. 2023 Dec;90:102965. doi: 10.1016/j.media.2023.102965. Epub 2023 Sep 22.
10
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning.BadCM:针对跨模态学习的隐形后门攻击。
IEEE Trans Image Process. 2024;33:2558-2571. doi: 10.1109/TIP.2024.3378918. Epub 2024 Apr 3.

引用本文的文献

1
Machine learning for the prediction of urosepsis using electronic health record data.使用电子健康记录数据进行机器学习以预测泌尿道感染败血症
PLOS Digit Health. 2025 Jul 3;4(7):e0000896. doi: 10.1371/journal.pdig.0000896. eCollection 2025 Jul.
2
BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records.BadCLM:电子健康记录临床语言模型中的后门攻击
AMIA Annu Symp Proc. 2025 May 22;2024:768-777. eCollection 2024.
3
Comments on Contemporary Uses of Machine Learning for Electronic Health Records.关于机器学习在电子健康记录中的当代应用的评论。

本文引用的文献

1
Explainable Uncertainty-Aware Convolutional Recurrent Neural Network for Irregular Medical Time Series.可解释不确定性感知卷积递归神经网络在不规则医学时间序列中的应用。
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4665-4679. doi: 10.1109/TNNLS.2020.3025813. Epub 2021 Oct 5.
2
Cross-Domain Missingness-Aware Time-Series Adaptation With Similarity Distillation in Medical Applications.跨领域缺失感知时间序列自适应与相似性提取在医学应用中的研究
IEEE Trans Cybern. 2022 May;52(5):3394-3407. doi: 10.1109/TCYB.2020.3011934. Epub 2022 May 19.
3
Training a neural network for Gibbs and noise removal in diffusion MRI.
N C Med J. 2024 Jun;85(4):263-265. doi: 10.18043/001c.120570.
训练神经网络进行扩散 MRI 中的 Gibbs 噪声和噪声去除。
Magn Reson Med. 2021 Jan;85(1):413-428. doi: 10.1002/mrm.28395. Epub 2020 Jul 14.
4
SICE: an improved missing data imputation technique.SICE:一种改进的缺失数据插补技术。
J Big Data. 2020;7(1):37. doi: 10.1186/s40537-020-00313-w. Epub 2020 Jun 12.
5
Deep learning for electronic health records: A comparative review of multiple deep neural architectures.深度学习在电子健康记录中的应用:多种深度神经网络架构的比较综述。
J Biomed Inform. 2020 Jan;101:103337. doi: 10.1016/j.jbi.2019.103337.
6
Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry.从临床注册研究中估计患者报告结局变化时缺失数据对偏差和精度的影响。
Health Qual Life Outcomes. 2019 Jun 20;17(1):106. doi: 10.1186/s12955-019-1181-2.
7
Multitask learning and benchmarking with clinical time series data.多任务学习与临床时间序列数据的基准测试。
Sci Data. 2019 Jun 17;6(1):96. doi: 10.1038/s41597-019-0103-9.
8
Deep learning-based electroencephalography analysis: a systematic review.基于深度学习的脑电图分析:系统评价。
J Neural Eng. 2019 Aug 14;16(5):051001. doi: 10.1088/1741-2552/ab260c.
9
Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis.深度电子健康记录(EHR):深度学习技术在电子健康记录(EHR)分析中的最新进展综述。
IEEE J Biomed Health Inform. 2018 Sep;22(5):1589-1604. doi: 10.1109/JBHI.2017.2767063. Epub 2017 Oct 27.
10
Current Applications and Future Impact of Machine Learning in Radiology.机器学习在放射学中的当前应用和未来影响。
Radiology. 2018 Aug;288(2):318-328. doi: 10.1148/radiol.2018171820. Epub 2018 Jun 26.