Public Health and Primary Care, Health Campus The Hague, Leiden University Medical Center, Albinusdreef 2, Leiden, South-Holland, 2333ZA, Netherlands.
Leiden Institute of Advanced Computer Science, Leiden University, Einsteinweg 55, Leiden, South-Holland, 2333CC, Netherlands.
BMC Med Res Methodol. 2024 Aug 14;24(1):181. doi: 10.1186/s12874-024-02304-4.
Synthetic Electronic Health Records (EHRs) are becoming increasingly popular as a privacy enhancing technology. However, for longitudinal EHRs specifically, little research has been done into how to properly evaluate synthetically generated samples. In this article, we provide a discussion on existing methods and recommendations when evaluating the quality of synthetic longitudinal EHRs.
We recommend to assess synthetic EHR quality through similarity to real EHRs in low-dimensional projections, accuracy of a classifier discriminating synthetic from real samples, performance of synthetic versus real trained algorithms in clinical tasks, and privacy risk through risk of attribute inference. For each metric we discuss strengths and weaknesses, next to showing how it can be applied on a longitudinal dataset.
To support the discussion on evaluation metrics, we apply discussed metrics on a dataset of synthetic EHRs generated from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) repository.
The discussion on evaluation metrics provide guidance for researchers on how to use and interpret different metrics when evaluating the quality of synthetic longitudinal EHRs.
作为一种增强隐私保护的技术,合成电子健康记录(EHR)正变得越来越受欢迎。然而,对于纵向 EHR 具体而言,对于如何正确评估合成生成的样本,几乎没有研究。在本文中,我们将讨论在评估合成纵向 EHR 质量时现有的方法和建议。
我们建议通过在低维投影中与真实 EHR 的相似性、区分合成与真实样本的分类器的准确性、在临床任务中合成与真实训练算法的性能以及通过属性推断的隐私风险来评估合成 EHR 的质量。对于每个指标,我们都讨论了其优缺点,并展示了如何将其应用于纵向数据集。
为了支持对评估指标的讨论,我们将讨论的指标应用于从医疗信息集市重症监护-IV(MIMIC-IV)存储库生成的合成 EHR 数据集。
对评估指标的讨论为研究人员提供了如何在评估合成纵向 EHR 质量时使用和解释不同指标的指导。