Bilici Ozyigit Eda, Arvanitis Theodoros N, Despotou George
Institute of Digital Healthcare, WMG, University of Warwick, UK.
Stud Health Technol Inform. 2020 Jun 26;272:322-325. doi: 10.3233/SHTI200560.
Assurance of digital health interventions involves, amongst others, clinical validation, which requires large datasets to test the application in realistic clinical scenarios. Development of such datasets is time consuming and challenging in terms of maintaining patient anonymity and consent.
The development of synthetic datasets that maintain the statistical properties of the real datasets.
An artificial neural network based, generative adversarial network was implemented and trained, using numerical and categorical variables, including ICD-9 codes from the MIMIC III dataset, to produce a synthetic dataset.
The synthetic dataset, exhibits a correlation matrix highly similar to the real dataset, good Jaccard similarity and passing the KS test.
The proof of concept was successful with the approach being promising for further work.
数字健康干预措施的验证包括临床验证等,这需要大型数据集来测试其在实际临床场景中的应用。开发此类数据集既耗时,又在维护患者匿名性和同意方面具有挑战性。
开发能保持真实数据集统计特性的合成数据集。
使用数值变量和分类变量(包括来自MIMIC III数据集的ICD - 9编码)实现并训练了一个基于人工神经网络的生成对抗网络,以生成合成数据集。
合成数据集呈现出与真实数据集高度相似的相关矩阵、良好的杰卡德相似度并通过了KS检验。
概念验证取得成功,该方法有望用于进一步的研究工作。