Hastings Cent Rep. 2024 Sep;54(5):8-13. doi: 10.1002/hast.4911.
Researchers and practitioners are increasingly using machine-generated synthetic data as a tool for advancing health science and practice, by expanding access to health data while-potentially-mitigating privacy and related ethical concerns around data sharing. While using synthetic data in this way holds promise, we argue that it also raises significant ethical, legal, and policy concerns, including persistent privacy and security problems, accuracy and reliability issues, worries about fairness and bias, and new regulatory challenges. The virtue of synthetic data is often understood to be its detachment from the data subjects whose measurement data is used to generate it. However, we argue that addressing the ethical issues synthetic data raises might require bringing data subjects back into the picture, finding ways that researchers and data subjects can be more meaningfully engaged in the construction and evaluation of datasets and in the creation of institutional safeguards that promote responsible use.
研究人员和从业者越来越多地使用机器生成的合成数据作为推进健康科学和实践的工具,通过扩大对健康数据的获取,同时潜在地减轻数据共享方面的隐私和相关伦理问题。虽然以这种方式使用合成数据具有很大的前景,但我们认为它也引发了重大的伦理、法律和政策问题,包括持续存在的隐私和安全问题、准确性和可靠性问题、对公平和偏见的担忧,以及新的监管挑战。合成数据的优点通常被理解为它与数据主体的分离,数据主体的测量数据被用来生成它。然而,我们认为,解决合成数据引发的伦理问题可能需要将数据主体重新纳入考虑范围,寻找让研究人员和数据主体能够更有意义地参与数据集的构建和评估,以及创建促进负责任使用的机构保障措施的方法。