Suppr超能文献

机器学习中的数据集安全:数据投毒、后门攻击及防御

Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses.

作者信息

Goldblum Micah, Tsipras Dimitris, Xie Chulin, Chen Xinyun, Schwarzschild Avi, Song Dawn, Madry Aleksander, Li Bo, Goldstein Tom

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1563-1580. doi: 10.1109/TPAMI.2022.3162397. Epub 2023 Jan 6.

Abstract

As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space.

摘要

随着机器学习系统规模的扩大,其训练数据需求也随之增加,这迫使从业者将训练数据的管理自动化并外包出去,以实现最先进的性能。在数据收集过程中缺乏可靠的人工监督,使组织面临安全漏洞;训练数据可能会被操纵,以控制和降低学习模型的下游行为。这项工作的目标是系统地分类和讨论各种数据集漏洞及利用方式、抵御这些威胁的方法,以及该领域一系列未解决的问题。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验