Suppr超能文献

基于合成数据的医学研究结果分析及其与真实数据结果的关系:五项观察性研究的系统比较

Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies.

作者信息

Reiner Benaim Anat, Almog Ronit, Gorelik Yuri, Hochberg Irit, Nassar Laila, Mashiach Tanya, Khamaisi Mogher, Lurie Yael, Azzam Zaher S, Khoury Johad, Kurnik Daniel, Beyar Rafael

机构信息

Clinical Epidemiology Unit, Rambam Health Care Campus, Haifa, Israel.

School of Public Health, University of Haifa, Haifa, Israel.

出版信息

JMIR Med Inform. 2020 Feb 20;8(2):e16492. doi: 10.2196/16492.

Abstract

BACKGROUND

Privacy restrictions limit access to protected patient-derived health information for research purposes. Consequently, data anonymization is required to allow researchers data access for initial analysis before granting institutional review board approval. A system installed and activated at our institution enables synthetic data generation that mimics data from real electronic medical records, wherein only fictitious patients are listed.

OBJECTIVE

This paper aimed to validate the results obtained when analyzing synthetic structured data for medical research. A comprehensive validation process concerning meaningful clinical questions and various types of data was conducted to assess the accuracy and precision of statistical estimates derived from synthetic patient data.

METHODS

A cross-hospital project was conducted to validate results obtained from synthetic data produced for five contemporary studies on various topics. For each study, results derived from synthetic data were compared with those based on real data. In addition, repeatedly generated synthetic datasets were used to estimate the bias and stability of results obtained from synthetic data.

RESULTS

This study demonstrated that results derived from synthetic data were predictive of results from real data. When the number of patients was large relative to the number of variables used, highly accurate and strongly consistent results were observed between synthetic and real data. For studies based on smaller populations that accounted for confounders and modifiers by multivariate models, predictions were of moderate accuracy, yet clear trends were correctly observed.

CONCLUSIONS

The use of synthetic structured data provides a close estimate to real data results and is thus a powerful tool in shaping research hypotheses and accessing estimated analyses, without risking patient privacy. Synthetic data enable broad access to data (eg, for out-of-organization researchers), and rapid, safe, and repeatable analysis of data in hospitals or other health organizations where patient privacy is a primary value.

摘要

背景

隐私限制阻碍了出于研究目的获取受保护的患者健康信息。因此,在获得机构审查委员会批准之前,需要对数据进行匿名化处理,以便研究人员能够访问数据进行初步分析。我们机构安装并启用的一个系统能够生成模拟真实电子病历数据的合成数据,其中仅列出虚拟患者。

目的

本文旨在验证在医学研究中分析合成结构化数据时获得的结果。针对有意义的临床问题和各种类型的数据进行了全面的验证过程,以评估从合成患者数据得出的统计估计的准确性和精确性。

方法

开展了一个跨医院项目,以验证从为五项关于不同主题的当代研究生成的合成数据中获得的结果。对于每项研究,将合成数据得出的结果与基于真实数据的结果进行比较。此外,使用反复生成的合成数据集来估计从合成数据获得的结果的偏差和稳定性。

结果

本研究表明,合成数据得出的结果可预测真实数据的结果。当患者数量相对于所使用的变量数量较多时,合成数据与真实数据之间观察到高度准确且高度一致的结果。对于基于较小样本量且通过多变量模型考虑了混杂因素和修正因素的研究,预测具有中等准确性,但正确观察到了明显趋势。

结论

使用合成结构化数据能够提供与真实数据结果相近的估计,因此是形成研究假设和进行估计分析的有力工具,同时不会危及患者隐私。合成数据使广泛的数据访问成为可能(例如,对于机构外的研究人员),并且能够在患者隐私是首要价值的医院或其他卫生组织中对数据进行快速、安全且可重复的分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/031b6ee821af/medinform_v8i2e16492_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验