Nisevic Maja, Milojevic Dusko, Spajic Daniela
CiTiP KUL, Belgium.
Comput Struct Biotechnol J. 2025 May 29;28:190-198. doi: 10.1016/j.csbj.2025.05.026. eCollection 2025.
Synthetic data is increasingly used in healthcare to facilitate privacy-preserving research, algorithm training, and patient profiling. By mimicking the statistical properties of real data without exposing identifiable information, synthetic data promises to resolve tensions between innovation and data protection. However, its legal and ethical implications remain insufficiently examined, particularly within the European Union (EU) regulatory landscape. This paper contributes to the emerging field of synthetic data governance by proposing a differentiated legal-ethical framework tailored to EU law. This paper follows a three-part taxonomy of synthetic data (fully synthetic, partially synthetic, and hybrid synthetic data) based on generation methods and identifiability risk. This taxonomy is situated within the broader context of the General Data Protection Regulation, the Artificial Intelligence Act, and the Medical Devices Regulation, clarifying when and how synthetic data may fall under EU regulatory scope. Focusing on patient profiling as a high-risk use case, the paper shows that while fully synthetic data may not constitute personal data, its downstream application in clinical or decision-making systems can still raise fairness, bias, and accountability concerns. The ethical analysis of profiling practices utilizing synthetic data is conducted through the lens of the four foundational biomedical principles: autonomy, beneficence, non-maleficence, and justice. The paper calls for sector-specific standards, generation quality benchmarks, and governance mechanisms aligning technical innovation with legal compliance and ethical integrity in digital health.
合成数据在医疗保健领域的应用日益广泛,以促进隐私保护研究、算法训练和患者画像分析。通过模仿真实数据的统计特性而不暴露可识别信息,合成数据有望解决创新与数据保护之间的矛盾。然而,其法律和伦理影响仍未得到充分研究,尤其是在欧盟的监管环境中。本文通过提出一个符合欧盟法律的差异化法律伦理框架,为合成数据治理这一新兴领域做出了贡献。本文基于生成方法和可识别风险,对合成数据进行了三分法分类(完全合成数据、部分合成数据和混合合成数据)。这种分类法置于《通用数据保护条例》、《人工智能法案》和《医疗器械条例》的更广泛背景下,阐明了合成数据何时以及如何可能属于欧盟监管范围。本文以患者画像分析作为一个高风险用例,表明虽然完全合成数据可能不构成个人数据,但其在临床或决策系统中的下游应用仍可能引发公平性、偏差和问责方面的问题。利用合成数据进行画像分析实践的伦理分析是通过四项基础生物医学原则的视角进行的:自主性、 beneficence、不伤害和正义。本文呼吁制定特定行业标准、生成质量基准以及治理机制,使数字健康领域的技术创新与法律合规和伦理诚信保持一致。 (注:“beneficence”常见释义为“善行”“慈善”,在医学伦理语境中可理解为“有益”等含义,此处保留英文,可能是特定医学伦理概念表述)