• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

癌症和脑血管病研究中的合成数据:大数据的一种新方法。

Synthetic data in cancer and cerebrovascular disease research: A novel approach to big data.

机构信息

School of Epidemiology and Public Health, University of Ottawa, Ottawa, Canada.

Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada.

出版信息

PLoS One. 2024 Feb 7;19(2):e0295921. doi: 10.1371/journal.pone.0295921. eCollection 2024.

DOI:10.1371/journal.pone.0295921
PMID:38324588
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10849264/
Abstract

OBJECTIVES

Synthetic datasets are artificially manufactured based on real health systems data but do not contain real patient information. We sought to validate the use of synthetic data in stroke and cancer research by conducting a comparison study of cancer patients with ischemic stroke to non-cancer patients with ischemic stroke.

DESIGN

retrospective cohort study.

SETTING

We used synthetic data generated by MDClone and compared it to its original source data (i.e. real patient data from the Ottawa Hospital Data Warehouse).

OUTCOME MEASURES

We compared key differences in demographics, treatment characteristics, length of stay, and costs between cancer patients with ischemic stroke and non-cancer patients with ischemic stroke. We used a binary, multivariable logistic regression model to identify risk factors for recurrent stroke in the cancer population.

RESULTS

Using synthetic data, we found cancer patients with ischemic stroke had a lower prevalence of hypertension (52.0% in the cancer cohort vs 57.7% in the non-cancer cohort, p<0.0001), and a higher prevalence of chronic obstructive pulmonary disease (COPD: 8.5% vs 4.7%, p<0.0001), prior ischemic stroke (1.7% vs 0.1%, p<0.001), and prior venous thromboembolism (VTE: 8.2% vs 1.5%, p<0.0001). They also had a longer length of stay (8 days [IQR 3-16] vs 6 days [IQR 3-13], p = 0.011), and higher costs associated with their stroke encounters: $11,498 (IQR $4,440 -$20,668) in the cancer cohort vs $8,084 (IQR $3,947 -$16,706) in the non-cancer cohort (p = 0.0061). A multivariable logistic regression model identified 5 predictors for recurrent ischemic stroke in the cancer cohort using synthetic data; 3 of the same predictors identified using real patient data with similar effect measures. Summary statistics between synthetic and original datasets did not significantly differ, other than slight differences in the distributions of frequencies for numeric data.

CONCLUSION

We demonstrated the utility of synthetic data in stroke and cancer research and provided key differences between cancer and non-cancer patients with ischemic stroke. Synthetic data is a powerful tool that can allow researchers to easily explore hypothesis generation, enable data sharing without privacy breaches, and ensure broad access to big data in a rapid, safe, and reliable fashion.

摘要

目的

合成数据集是根据真实的健康系统数据人工制造的,但不包含真实患者的信息。我们通过对患有缺血性中风的癌症患者与非癌症患者进行比较研究,旨在验证在中风和癌症研究中使用合成数据的合理性。

设计

回顾性队列研究。

地点

我们使用 MDClone 生成的合成数据,并将其与原始数据源数据(即渥太华医院数据仓库中的真实患者数据)进行比较。

结果测量

我们比较了患有缺血性中风的癌症患者与非癌症患者在人口统计学、治疗特征、住院时间和费用方面的关键差异。我们使用二元多变量逻辑回归模型来确定癌症人群中复发性中风的危险因素。

结论

我们证明了合成数据在中风和癌症研究中的实用性,并提供了癌症和非癌症缺血性中风患者之间的关键差异。合成数据是一种强大的工具,可以帮助研究人员轻松地探索假设生成,在不侵犯隐私的情况下实现数据共享,并以快速、安全和可靠的方式确保对大数据的广泛访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d5a/10849264/d8419ea46cc3/pone.0295921.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d5a/10849264/d8419ea46cc3/pone.0295921.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d5a/10849264/d8419ea46cc3/pone.0295921.g001.jpg

相似文献

1
Synthetic data in cancer and cerebrovascular disease research: A novel approach to big data.癌症和脑血管病研究中的合成数据:大数据的一种新方法。
PLoS One. 2024 Feb 7;19(2):e0295921. doi: 10.1371/journal.pone.0295921. eCollection 2024.
2
Association of Recent Use of Non-Vitamin K Antagonist Oral Anticoagulants With Intracranial Hemorrhage Among Patients With Acute Ischemic Stroke Treated With Alteplase.近期使用非维生素 K 拮抗剂口服抗凝剂与阿替普酶治疗的急性缺血性脑卒中患者颅内出血的相关性。
JAMA. 2022 Feb 22;327(8):760-771. doi: 10.1001/jama.2022.0948.
3
Ambulatory Status Protects against Venous Thromboembolism in Acute Mild Ischemic Stroke Patients.门诊状态可预防急性轻度缺血性卒中患者发生静脉血栓栓塞。
J Stroke Cerebrovasc Dis. 2016 Oct;25(10):2496-501. doi: 10.1016/j.jstrokecerebrovasdis.2016.06.025. Epub 2016 Jul 14.
4
Mortality and Morbidity Effects of Long-Term Exposure to Low-Level PM, BC, NO, and O: An Analysis of European Cohorts in the ELAPSE Project.长期暴露于低水平 PM、BC、NO 和 O 对死亡率和发病率的影响:ELAPSE 项目中欧洲队列的分析。
Res Rep Health Eff Inst. 2021 Sep;2021(208):1-127.
5
Risk factors and predictors for venous thromboembolism in people with ischemic stroke: A systematic review.缺血性脑卒中患者静脉血栓栓塞的风险因素和预测因素:系统评价。
J Thromb Haemost. 2022 Oct;20(10):2173-2186. doi: 10.1111/jth.15813. Epub 2022 Jul 28.
6
Analysis of Prescriptions for Dual Antiplatelet Therapy After Acute Ischemic Stroke.急性缺血性脑卒中后双联抗血小板治疗处方分析。
JAMA Netw Open. 2022 Jul 1;5(7):e2224157. doi: 10.1001/jamanetworkopen.2022.24157.
7
The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures.合成临床数据的有效性:使用临床质量指标对领先的合成数据生成器(Synthea)进行验证研究。
BMC Med Inform Decis Mak. 2019 Mar 14;19(1):44. doi: 10.1186/s12911-019-0793-0.
8
Therapeutic efficacy of heparin and direct factor Xa inhibitors in cancer-associated cryptogenic ischemic stroke with venous thromboembolism.肝素和直接因子 Xa 抑制剂在伴有静脉血栓栓塞的癌症相关性隐源性缺血性卒中的治疗效果。
Thromb Res. 2021 Oct;206:99-103. doi: 10.1016/j.thromres.2021.08.016. Epub 2021 Aug 23.
9
Pulmonary hypertension: An unexplored risk factor for stroke in patients with atrial fibrillation.肺动脉高压:房颤患者卒中的一个未被探索的危险因素。
J Stroke Cerebrovasc Dis. 2023 Sep;32(9):107247. doi: 10.1016/j.jstrokecerebrovasdis.2023.107247. Epub 2023 Jul 29.
10
Potential Augmentation of the Risk of Ischemic Cerebrovascular Accident by Chronic Obstructive Pulmonary Disease in Patients with Atrial Fibrillation.慢性阻塞性肺疾病对心房颤动患者缺血性脑血管意外风险的潜在增加作用
J Stroke Cerebrovasc Dis. 2015 Aug;24(8):1893-6. doi: 10.1016/j.jstrokecerebrovasdis.2015.04.034. Epub 2015 Jun 30.

引用本文的文献

1
Transforming the future of health: building learning health systems across the globe.变革健康的未来:构建全球学习型健康系统。
Health Aff Sch. 2025 May 21;3(6):qxaf103. doi: 10.1093/haschl/qxaf103. eCollection 2025 Jun.
2
Evaluation of synthetic data impact on fire segmentation models performance.合成数据对火灾分割模型性能影响的评估。
Sci Rep. 2025 May 14;15(1):16759. doi: 10.1038/s41598-025-01571-5.
3
Augmenting Insufficiently Accruing Oncology Clinical Trials Using Generative Models: Validation Study.使用生成模型增强入组不足的肿瘤学临床试验:验证研究

本文引用的文献

1
Effect modification of age and hypertension on cancer and prevalence of self-reported stroke - A cross-sectional study.年龄和高血压对癌症及自我报告卒中流行的作用修饰:一项横断面研究。
Cancer Med. 2023 Jun;12(11):12518-12523. doi: 10.1002/cam4.5964. Epub 2023 Apr 21.
2
Author Response: Clinical Outcome After Endovascular Treatment in Patients With Active Cancer and Ischemic Stroke: A MR CLEAN Registry Substudy.作者回复:活动性癌症合并缺血性卒中患者血管内治疗后的临床结局:一项MR CLEAN注册研究的子研究
Neurology. 2022 Jul 26;99(4):175. doi: 10.1212/WNL.0000000000200965.
3
Unforeseen changes in seasonality of pediatric respiratory illnesses during the first COVID-19 pandemic year.
J Med Internet Res. 2025 Mar 5;27:e66821. doi: 10.2196/66821.
4
Addressing contemporary threats in anonymised healthcare data using privacy engineering.利用隐私工程应对匿名医疗保健数据中的当代威胁。
NPJ Digit Med. 2025 Mar 6;8(1):145. doi: 10.1038/s41746-025-01520-6.
5
An evaluation of the replicability of analyses using synthetic health data.利用合成健康数据评估分析结果的可重复性。
Sci Rep. 2024 Mar 24;14(1):6978. doi: 10.1038/s41598-024-57207-7.
在首个 COVID-19 大流行年期间,儿科呼吸道疾病季节性出现意外变化。
Pediatr Pulmonol. 2022 Jun;57(6):1425-1431. doi: 10.1002/ppul.25896. Epub 2022 Mar 31.
4
Cancer-associated venous thromboembolism.癌症相关静脉血栓栓塞症。
Nat Rev Dis Primers. 2022 Feb 17;8(1):11. doi: 10.1038/s41572-022-00336-y.
5
The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data.国家 COVID 队列协作组:原始和计算衍生电子健康记录数据的分析。
J Med Internet Res. 2021 Oct 4;23(10):e30697. doi: 10.2196/30697.
6
Spot the difference: comparing results of analyses from real patient data and synthetic derivatives.找出差异:比较来自真实患者数据和合成衍生物的分析结果。
JAMIA Open. 2020 Dec 14;3(4):557-566. doi: 10.1093/jamiaopen/ooaa060. eCollection 2020 Dec.
7
Cancer and Embolic Stroke of Undetermined Source.无法确定来源的癌症和栓塞性脑卒中。
Stroke. 2021 Mar;52(3):1121-1130. doi: 10.1161/STROKEAHA.120.032002. Epub 2021 Jan 28.
8
Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies.基于合成数据的医学研究结果分析及其与真实数据结果的关系:五项观察性研究的系统比较
JMIR Med Inform. 2020 Feb 20;8(2):e16492. doi: 10.2196/16492.
9
Factors predicting length of stay in bronchiolitis.毛细支气管炎住院时间的预测因素。
Respir Med. 2020 Jan;161:105824. doi: 10.1016/j.rmed.2019.105824. Epub 2019 Nov 16.
10
Thrombolysis Guided by Perfusion Imaging up to 9 Hours after Onset of Stroke.发病 9 小时内采用灌注成像指导的溶栓治疗。
N Engl J Med. 2019 May 9;380(19):1795-1803. doi: 10.1056/NEJMoa1813046.