Kaabachi Bayrem, Despraz Jérémie, Meurers Thierry, Otte Karen, Halilovic Mehmed, Kulynych Bogdan, Prasser Fabian, Raisaro Jean Louis
Biomedical Data Science Center, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland.
Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.
NPJ Digit Med. 2025 Jan 27;8(1):60. doi: 10.1038/s41746-024-01359-3.
The use of synthetic data is a promising solution to facilitate the sharing and reuse of health-related data beyond its initial collection while addressing privacy concerns. However, there is still no consensus on a standardized approach for systematically evaluating the privacy and utility of synthetic data, impeding its broader adoption. In this work, we present a comprehensive review and systematization of current methods for evaluating synthetic health-related data, focusing on both privacy and utility aspects. Our findings suggest that there are a variety of methods for assessing the utility of synthetic data, but no consensus on which method is optimal in which scenario. Moreover, we found that most studies included in this review do not evaluate the privacy protection provided by synthetic data, and those that do often significantly underestimate the risks.
合成数据的使用是一种很有前景的解决方案,有助于在解决隐私问题的同时,促进健康相关数据在其初始收集之外的共享和重用。然而,对于系统评估合成数据的隐私性和实用性的标准化方法,目前仍未达成共识,这阻碍了其更广泛的采用。在这项工作中,我们对当前评估合成健康相关数据的方法进行了全面回顾和系统化整理,重点关注隐私和实用性两个方面。我们的研究结果表明,有多种评估合成数据实用性的方法,但对于哪种方法在何种场景下是最优的,尚未达成共识。此外,我们发现,本综述纳入的大多数研究并未评估合成数据提供的隐私保护,而那些评估了的研究往往严重低估了风险。