Suppr超能文献

合成数据能否替代真实的临床试验数据?一项验证性研究。

Can synthetic data be a proxy for real clinical trial data? A validation study.

作者信息

Azizi Zahra, Zheng Chaoyi, Mosquera Lucy, Pilote Louise, El Emam Khaled

机构信息

Center for Outcomes Research and Evaluation, Faculty of Medicine, McGill University, Montreal, Québec, Canada.

Data Science, Replica Analytics Ltd, Ottawa, Ontario, Canada.

出版信息

BMJ Open. 2021 Apr 16;11(4):e043497. doi: 10.1136/bmjopen-2020-043497.

Abstract

OBJECTIVES

There are increasing requirements to make research data, especially clinical trial data, more broadly available for secondary analyses. However, data availability remains a challenge due to complex privacy requirements. This challenge can potentially be addressed using synthetic data.

SETTING

Replication of a published stage III colon cancer trial secondary analysis using synthetic data generated by a machine learning method.

PARTICIPANTS

There were 1543 patients in the control arm that were included in our analysis.

PRIMARY AND SECONDARY OUTCOME MEASURES

Analyses from a study published on the real dataset were replicated on synthetic data to investigate the relationship between bowel obstruction and event-free survival. Information theoretic metrics were used to compare the univariate distributions between real and synthetic data. Percentage CI overlap was used to assess the similarity in the size of the bivariate relationships, and similarly for the multivariate Cox models derived from the two datasets.

RESULTS

Analysis results were similar between the real and synthetic datasets. The univariate distributions were within 1% of difference on an information theoretic metric. All of the bivariate relationships had CI overlap on the tau statistic above 50%. The main conclusion from the published study, that lack of bowel obstruction has a strong impact on survival, was replicated directionally and the HR CI overlap between the real and synthetic data was 61% for overall survival (real data: HR 1.56, 95% CI 1.11 to 2.2; synthetic data: HR 2.03, 95% CI 1.44 to 2.87) and 86% for disease-free survival (real data: HR 1.51, 95% CI 1.18 to 1.95; synthetic data: HR 1.63, 95% CI 1.26 to 2.1).

CONCLUSIONS

The high concordance between the analytical results and conclusions from synthetic and real data suggests that synthetic data can be used as a reasonable proxy for real clinical trial datasets.

TRIAL REGISTRATION NUMBER

NCT00079274.

摘要

目的

使研究数据,尤其是临床试验数据更广泛地用于二次分析的需求日益增加。然而,由于复杂的隐私要求,数据可用性仍然是一个挑战。使用合成数据可能解决这一挑战。

设置

使用机器学习方法生成的合成数据对已发表的III期结肠癌试验二次分析进行复制。

参与者

我们的分析纳入了对照组中的1543例患者。

主要和次要结局指标

在合成数据上复制了对真实数据集发表的一项研究的分析,以研究肠梗阻与无事件生存期之间的关系。使用信息理论指标比较真实数据和合成数据之间的单变量分布。使用百分比置信区间重叠来评估双变量关系大小的相似性,对于从两个数据集得出的多变量Cox模型也是如此。

结果

真实数据集和合成数据集的分析结果相似。在信息理论指标上,单变量分布的差异在1%以内。所有双变量关系在tau统计量上的置信区间重叠均超过50%。已发表研究的主要结论,即无肠梗阻对生存有强烈影响,在方向上得到了复制,总体生存期的真实数据和合成数据之间的风险比置信区间重叠为61%(真实数据:风险比1.56,95%置信区间1.至2.2;合成数据:风险比2.03,95%置信区间1.44至2.87),无病生存期为86%(真实数据:风险比1.51,95%置信区间1.18至1.95;合成数据:风险比1.63,95%置信区间1.26至2.1)。

结论

合成数据与真实数据的分析结果和结论高度一致,表明合成数据可作为真实临床试验数据集的合理替代。

试验注册号

NCT00079274。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cad/8055130/3ead733133f6/bmjopen-2020-043497f01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验