文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Can synthetic data be a proxy for real clinical trial data? A validation study.

作者信息

Azizi Zahra, Zheng Chaoyi, Mosquera Lucy, Pilote Louise, El Emam Khaled

机构信息

Center for Outcomes Research and Evaluation, Faculty of Medicine, McGill University, Montreal, Québec, Canada.

Data Science, Replica Analytics Ltd, Ottawa, Ontario, Canada.

出版信息

BMJ Open. 2021 Apr 16;11(4):e043497. doi: 10.1136/bmjopen-2020-043497.


DOI:10.1136/bmjopen-2020-043497
PMID:33863713
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8055130/
Abstract

OBJECTIVES: There are increasing requirements to make research data, especially clinical trial data, more broadly available for secondary analyses. However, data availability remains a challenge due to complex privacy requirements. This challenge can potentially be addressed using synthetic data. SETTING: Replication of a published stage III colon cancer trial secondary analysis using synthetic data generated by a machine learning method. PARTICIPANTS: There were 1543 patients in the control arm that were included in our analysis. PRIMARY AND SECONDARY OUTCOME MEASURES: Analyses from a study published on the real dataset were replicated on synthetic data to investigate the relationship between bowel obstruction and event-free survival. Information theoretic metrics were used to compare the univariate distributions between real and synthetic data. Percentage CI overlap was used to assess the similarity in the size of the bivariate relationships, and similarly for the multivariate Cox models derived from the two datasets. RESULTS: Analysis results were similar between the real and synthetic datasets. The univariate distributions were within 1% of difference on an information theoretic metric. All of the bivariate relationships had CI overlap on the tau statistic above 50%. The main conclusion from the published study, that lack of bowel obstruction has a strong impact on survival, was replicated directionally and the HR CI overlap between the real and synthetic data was 61% for overall survival (real data: HR 1.56, 95% CI 1.11 to 2.2; synthetic data: HR 2.03, 95% CI 1.44 to 2.87) and 86% for disease-free survival (real data: HR 1.51, 95% CI 1.18 to 1.95; synthetic data: HR 1.63, 95% CI 1.26 to 2.1). CONCLUSIONS: The high concordance between the analytical results and conclusions from synthetic and real data suggests that synthetic data can be used as a reasonable proxy for real clinical trial datasets. TRIAL REGISTRATION NUMBER: NCT00079274.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/7e55b056ef23/bmjopen-2020-043497f07.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/3ead733133f6/bmjopen-2020-043497f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/127f10e207e9/bmjopen-2020-043497f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/69ce349522ed/bmjopen-2020-043497f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/ded7a3ae8abd/bmjopen-2020-043497f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/d941285929e7/bmjopen-2020-043497f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/a3e26632428c/bmjopen-2020-043497f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/7e55b056ef23/bmjopen-2020-043497f07.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/3ead733133f6/bmjopen-2020-043497f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/127f10e207e9/bmjopen-2020-043497f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/69ce349522ed/bmjopen-2020-043497f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/ded7a3ae8abd/bmjopen-2020-043497f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/d941285929e7/bmjopen-2020-043497f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/a3e26632428c/bmjopen-2020-043497f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/7e55b056ef23/bmjopen-2020-043497f07.jpg

相似文献

[1]
Can synthetic data be a proxy for real clinical trial data? A validation study.

BMJ Open. 2021-4-16

[2]
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022-2-1

[3]
A framework to create, evaluate and select synthetic datasets for survival prediction in oncology.

Comput Biol Med. 2025-6

[4]
Lymphocytic infiltration in stage II microsatellite stable colorectal tumors: A retrospective prognosis biomarker analysis.

PLoS Med. 2020-9-24

[5]
A method for generating synthetic longitudinal health data.

BMC Med Res Methodol. 2023-3-23

[6]
Outcomes of dostarlimab versus chemotherapy in post-platinum patients with recurrent/advanced endometrial cancer: data from the GARNET trial and the National Cancer Registration Service in England.

Int J Gynecol Cancer. 2023-11-6

[7]
Chemotherapy alone versus chemotherapy plus radiotherapy for adults with early-stage Hodgkin's lymphoma.

Cochrane Database Syst Rev. 2024-12-2

[8]
Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets.

JCO Clin Cancer Inform. 2023-9

[9]
Assessment of Progression-Free Survival as a Surrogate End Point of Overall Survival in First-Line Treatment of Ovarian Cancer: A Systematic Review and Meta-analysis.

JAMA Netw Open. 2020-1-3

[10]
Generating high-fidelity synthetic time-to-event datasets to improve data transparency and accessibility.

BMC Med Res Methodol. 2022-6-23

引用本文的文献

[1]
Treatment disparities and prognostic implications in octogenarians versus non-octogenarians with high-gradient severe aortic stenosis.

Open Heart. 2025-8-14

[2]
Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?

Clin Transl Allergy. 2025-8

[3]
Transporting trial results to synthetic real-world populations in order to estimate real-world effectiveness of newly marketed medicines.

BMJ Open. 2025-7-24

[4]
Synthetic Data for Sharing and Exploration in High-Performance Sport: Considerations for Application.

Sports Med. 2025-6-26

[5]
Tempered enthusiasm by interviewed experts for synthetic data and ELSI checklists for AI in medicine.

AI Ethics. 2025

[6]
SeqTrial: Utility Preserving Sequential Clinical Trial Data Generator.

AMIA Annu Symp Proc. 2025-5-22

[7]
Synthetic Data in Healthcare and Drug Development: Definitions, Regulatory Frameworks, Issues.

CPT Pharmacometrics Syst Pharmacol. 2025-5

[8]
Augmenting Insufficiently Accruing Oncology Clinical Trials Using Generative Models: Validation Study.

J Med Internet Res. 2025-3-5

[9]
Data linkage multiplies research insights across diverse healthcare sectors.

Commun Med (Lond). 2025-3-4

[10]
A scoping review of privacy and utility metrics in medical synthetic data.

NPJ Digit Med. 2025-1-27

本文引用的文献

[1]
The Use of Synthetic Electronic Health Record Data and Deep Learning to Improve Timing of High-Risk Heart Failure Surgical Intervention by Predicting Proximity to Catastrophic Decompensation.

Front Digit Health. 2020-12-7

[2]
Spot the difference: comparing results of analyses from real patient data and synthetic derivatives.

JAMIA Open. 2020-12-14

[3]
Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation.

J Med Internet Res. 2020-11-16

[4]
Public perceptions on data sharing: key insights from the UK and the USA.

Lancet Digit Health. 2020-9

[5]
Obtaining and managing data sets for individual participant data meta-analysis: scoping review and practical guide.

BMC Med Res Methodol. 2020-5-12

[6]
A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation.

Elife. 2020-3-11

[7]
Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies.

JMIR Med Inform. 2020-2-20

[8]
European Medicines Agency Policy 0070: an exploratory review of data utility in clinical study reports for academic research.

BMC Med Res Methodol. 2019-11-5

[9]
Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing.

Circ Cardiovasc Qual Outcomes. 2019-7

[10]
Re-identification Risks in HIPAA Safe Harbor Data: A study of data from one environmental health study.

Technol Sci. 2017

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索