Suppr超能文献

找出差异:比较来自真实患者数据和合成衍生物的分析结果。

Spot the difference: comparing results of analyses from real patient data and synthetic derivatives.

作者信息

Foraker Randi E, Yu Sean C, Gupta Aditi, Michelson Andrew P, Pineda Soto Jose A, Colvin Ryan, Loh Francis, Kollef Marin H, Maddox Thomas, Evanoff Bradley, Dror Hovav, Zamstein Noa, Lai Albert M, Payne Philip R O

机构信息

Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.

Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.

出版信息

JAMIA Open. 2020 Dec 14;3(4):557-566. doi: 10.1093/jamiaopen/ooaa060. eCollection 2020 Dec.

Abstract

BACKGROUND

Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification.

OBJECTIVES

To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns.

METHODS

We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3).

RESULTS

For each use case, the results of the analyses were sufficiently statistically similar ( > 0.05) between the synthetic derivative and the real data to draw the same conclusions.

DISCUSSION AND CONCLUSION

This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.

摘要

背景

合成数据可能为希望生成和共享数据以支持精准医疗的研究人员提供一种解决方案。数据合成的最新进展使得能够创建和分析合成衍生物,就好像它们是原始数据一样;这一过程相对于数据去识别具有显著优势。

目的

评估一个具有数据合成能力的大数据平台(MDClone有限公司,以色列贝尔谢巴)生成可用于研究目的的数据的能力,同时消除隐私和保密问题。

方法

我们探讨了三个用例,并通过使用传统统计、机器学习方法以及数据的空间表示,将使用合成衍生物的分析结果与使用原始数据的分析结果进行比较,来测试合成数据的稳健性。我们设计这些用例的目的是在观察层面(用例1)、患者队列(用例2)和人群层面数据(用例3)进行分析。

结果

对于每个用例,合成衍生物与真实数据之间的分析结果在统计学上具有足够的相似性(>0.05),从而能够得出相同的结论。

讨论与结论

本文展示了每个用例的结果,并概述了使用合成数据的关键考虑因素,探讨了它们在临床研究中的作用,以便更快地获得见解并改善数据共享,以支持精准医疗。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/148c/7886551/331dfb4ec0a9/ooaa060f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验