Rosenblatt Matthew, Dadashkarimi Javid, Scheinost Dustin
Department of Biomedical Engineering, Yale University.
Department of Computer Science, Yale University.
ArXiv. 2023 Aug 16:arXiv:2301.01885v2.
The prevalence of machine learning in biomedical research is rapidly growing, yet the trustworthiness of such research is often overlooked. While some previous works have investigated the ability of adversarial attacks to degrade model performance in medical imaging, the ability to falsely improve performance via recently-developed "enhancement attacks" may be a greater threat to biomedical machine learning. In the spirit of developing attacks to better understand trustworthiness, we developed two techniques to drastically enhance prediction performance of classifiers with minimal changes to features: 1) general enhancement of prediction performance, and 2) enhancement of a particular method over another. Our enhancement framework falsely improved classifiers' accuracy from 50% to almost 100% while maintaining high feature similarities between original and enhanced data (Pearson's ' > 0.99). Similarly, the method-specific enhancement framework was effective in falsely improving the performance of one method over another. For example, a simple neural network outperformed logistic regression by 17% on our enhanced dataset, although no performance differences were present in the original dataset. Crucially, the original and enhanced data were still similar ( = 0.99). Our results demonstrate the feasibility of minor data manipulations to achieve any desired prediction performance, which presents an interesting ethical challenge for the future of biomedical machine learning. These findings emphasize the need for more robust data provenance tracking and other precautionary measures to ensure the integrity of biomedical machine learning research. Code is available at https://github.com/mattrosenblatt7/enhancement_EPIMI.
机器学习在生物医学研究中的应用正迅速增加,但其研究的可信度却常常被忽视。虽然之前一些工作研究了对抗攻击在医学成像中降低模型性能的能力,但通过最近开发的“增强攻击”虚假提高性能的能力可能对生物医学机器学习构成更大威胁。本着开发攻击方法以更好理解可信度的精神,我们开发了两种技术,只需对特征进行最小更改就能大幅提高分类器的预测性能:1)预测性能的一般增强,以及2)一种特定方法相对于另一种方法的增强。我们的增强框架将分类器的准确率从50%虚假提高到近100%,同时保持原始数据和增强数据之间的高特征相似度(皮尔逊相关系数>0.99)。同样,特定方法增强框架在虚假提高一种方法相对于另一种方法的性能方面很有效。例如,在我们的增强数据集上,一个简单的神经网络比逻辑回归的性能高出17%,尽管在原始数据集中不存在性能差异。至关重要的是,原始数据和增强数据仍然相似(=0.99)。我们的结果证明了通过微小的数据操作实现任何所需预测性能的可行性,这给生物医学机器学习的未来带来了一个有趣的伦理挑战。这些发现强调了需要更强大的数据来源跟踪和其他预防措施,以确保生物医学机器学习研究的完整性。代码可在https://github.com/mattrosenblatt7/enhancement_EPIMI获取。