Zhou Zhanping, Guo Yuchen, Tang Ruijie, Liang Hengrui, He Jianxing, Xu Feng
School of Software, Tsinghua University, Beijing, China.
Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University, Beijing, China.
NPJ Digit Med. 2024 Oct 20;7(1):293. doi: 10.1038/s41746-024-01290-7.
The success of deep learning (DL) relies heavily on training data from which DL models encapsulate information. Consequently, the development and deployment of DL models expose data to potential privacy breaches, which are particularly critical in data-sensitive contexts like medicine. We propose a new technique named DiffGuard that generates realistic and diverse synthetic medical images with annotations, even indistinguishable for experts, to replace real data for DL model training, which cuts off their direct connection and enhances privacy safety. We demonstrate that DiffGuard enhances privacy safety with much less data leakage and better resistance against privacy attacks on data and model. It also improves the accuracy and generalizability of DL models for segmentation and classification of mediastinal neoplasms in multi-center evaluation. We expect that our solution would enlighten the road to privacy-preserving DL for precision medicine, promote data and model sharing, and inspire more innovation on artificial-intelligence-generated-content technologies for medicine.
深度学习(DL)的成功在很大程度上依赖于训练数据,DL模型从中封装信息。因此,DL模型的开发和部署会使数据面临潜在的隐私泄露风险,这在医学等数据敏感的环境中尤为关键。我们提出了一种名为DiffGuard的新技术,它可以生成带有注释的逼真且多样的合成医学图像,即使专家也难以区分,以取代用于DL模型训练的真实数据,从而切断它们的直接联系并增强隐私安全性。我们证明,DiffGuard以更少的数据泄露和对数据及模型隐私攻击的更好抵抗力增强了隐私安全性。它还提高了DL模型在多中心评估中对纵隔肿瘤进行分割和分类的准确性和通用性。我们期望我们的解决方案能为精准医学的隐私保护DL之路提供启示,促进数据和模型共享,并激发更多关于医学人工智能生成内容技术的创新。