• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

国家 COVID 队列协作组:原始和计算衍生电子健康记录数据的分析。

The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data.

机构信息

Division of General Medical Sciences, School of Medicine, Washington University in St. Louis, St. Louis, MO, United States.

Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, MO, United States.

出版信息

J Med Internet Res. 2021 Oct 4;23(10):e30697. doi: 10.2196/30697.

DOI:10.2196/30697
PMID:34559671
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8491642/
Abstract

BACKGROUND

Computationally derived ("synthetic") data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record data. Synthetic data can support data sharing to answer critical research questions to address the COVID-19 pandemic.

OBJECTIVE

We aim to compare the results from analyses of synthetic data to those from original data and assess the strengths and limitations of leveraging computationally derived data for research purposes.

METHODS

We used the National COVID Cohort Collaborative's instance of MDClone, a big data platform with data-synthesizing capabilities (MDClone Ltd). We downloaded electronic health record data from 34 National COVID Cohort Collaborative institutional partners and tested three use cases, including (1) exploring the distributions of key features of the COVID-19-positive cohort; (2) training and testing predictive models for assessing the risk of admission among these patients; and (3) determining geospatial and temporal COVID-19-related measures and outcomes, and constructing their epidemic curves. We compared the results from synthetic data to those from original data using traditional statistics, machine learning approaches, and temporal and spatial representations of the data.

RESULTS

For each use case, the results of the synthetic data analyses successfully mimicked those of the original data such that the distributions of the data were similar and the predictive models demonstrated comparable performance. Although the synthetic and original data yielded overall nearly the same results, there were exceptions that included an odds ratio on either side of the null in multivariable analyses (0.97 vs 1.01) and differences in the magnitude of epidemic curves constructed for zip codes with low population counts.

CONCLUSIONS

This paper presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in collaborative research for faster insights.

摘要

背景

计算衍生(“合成”)数据可以创建和分析临床、实验室和诊断数据,就像它们是原始电子健康记录数据一样。合成数据可以支持数据共享,以回答关键研究问题,应对 COVID-19 大流行。

目的

我们旨在比较分析合成数据的结果与原始数据的结果,并评估利用计算衍生数据进行研究的优势和局限性。

方法

我们使用了具有数据合成功能的大数据平台 MDClone 的 National COVID Cohort Collaborative 的实例(MDClone Ltd)。我们从 34 个 National COVID Cohort Collaborative 机构合作伙伴下载了电子健康记录数据,并测试了三个用例,包括(1)探索 COVID-19 阳性队列的关键特征分布;(2)训练和测试评估这些患者入院风险的预测模型;(3)确定与 COVID-19 相关的时空措施和结果,并构建其流行曲线。我们使用传统统计学、机器学习方法以及数据的时空表示来比较合成数据和原始数据的结果。

结果

对于每个用例,合成数据分析的结果成功地模拟了原始数据的结果,使得数据的分布相似,预测模型表现出可比的性能。尽管合成数据和原始数据总体上产生了几乎相同的结果,但也存在例外,包括多变量分析中单侧的优势比(0.97 与 1.01)和构建人口数量低的邮政编码的流行曲线的幅度差异。

结论

本文介绍了每个用例的结果,并概述了使用合成数据的关键考虑因素,探讨了它们在协作研究中更快获得见解的作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11a0/8491642/6c08e5432a06/jmir_v23i10e30697_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11a0/8491642/877587d52155/jmir_v23i10e30697_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11a0/8491642/4b0a9b52e52d/jmir_v23i10e30697_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11a0/8491642/c9d671c4a6f1/jmir_v23i10e30697_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11a0/8491642/6c08e5432a06/jmir_v23i10e30697_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11a0/8491642/877587d52155/jmir_v23i10e30697_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11a0/8491642/4b0a9b52e52d/jmir_v23i10e30697_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11a0/8491642/c9d671c4a6f1/jmir_v23i10e30697_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11a0/8491642/6c08e5432a06/jmir_v23i10e30697_fig4.jpg

相似文献

1
The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data.国家 COVID 队列协作组:原始和计算衍生电子健康记录数据的分析。
J Med Internet Res. 2021 Oct 4;23(10):e30697. doi: 10.2196/30697.
2
Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).展示一种评估合成地理空间和时间流行病学数据效用的方法:对美国国家 COVID 队列协作(N3C)中超过 180 万次 SARS-CoV-2 检测进行分析的结果。
J Am Med Inform Assoc. 2022 Jul 12;29(8):1350-1365. doi: 10.1093/jamia/ocac045.
3
Spot the difference: comparing results of analyses from real patient data and synthetic derivatives.找出差异:比较来自真实患者数据和合成衍生物的分析结果。
JAMIA Open. 2020 Dec 14;3(4):557-566. doi: 10.1093/jamiaopen/ooaa060. eCollection 2020 Dec.
4
Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: Results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).展示一种评估合成地理空间和时间流行病学数据效用的方法:美国国家新冠队列协作项目(N3C)中对超180万次新冠病毒检测分析的结果
medRxiv. 2021 Jul 8:2021.07.06.21259051. doi: 10.1101/2021.07.06.21259051.
5
Clinical Characterization and Prediction of Clinical Severity of SARS-CoV-2 Infection Among US Adults Using Data From the US National COVID Cohort Collaborative.利用美国国家 COVID 队列协作的数据,对美国成年人中 SARS-CoV-2 感染的临床特征和临床严重程度进行临床描述和预测。
JAMA Netw Open. 2021 Jul 1;4(7):e2116901. doi: 10.1001/jamanetworkopen.2021.16901.
6
COVID-19 Mortality Prediction From Deep Learning in a Large Multistate Electronic Health Record and Laboratory Information System Data Set: Algorithm Development and Validation.基于大型多状态电子健康记录和实验室信息系统数据集的深度学习预测 COVID-19 死亡率:算法开发与验证。
J Med Internet Res. 2021 Sep 28;23(9):e30157. doi: 10.2196/30157.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Identifying who has long COVID in the USA: a machine learning approach using N3C data.在美国识别长新冠患者:使用 N3C 数据的机器学习方法。
Lancet Digit Health. 2022 Jul;4(7):e532-e541. doi: 10.1016/S2589-7500(22)00048-6. Epub 2022 May 16.
9
A near real-time electronic health record-based COVID-19 surveillance system: An experience from a developing country.基于近乎实时的电子健康记录的 COVID-19 监测系统:来自发展中国家的经验。
Health Inf Manag. 2024 May;53(2):145-154. doi: 10.1177/18333583221104213. Epub 2022 Jul 15.
10
The Use of Online Consultation Systems or Remote Consulting in England Characterized Through the Primary Care Health Records of 53 Million People in the OpenSAFELY Platform: Retrospective Cohort Study.利用 OpenSAFELY 平台中 5300 万人的初级保健健康记录,描绘英格兰在线咨询系统或远程咨询的使用情况:回顾性队列研究。
JMIR Public Health Surveill. 2024 Sep 18;10:e46485. doi: 10.2196/46485.

引用本文的文献

1
An evaluation of the replicability of analyses using synthetic health data.利用合成健康数据评估分析结果的可重复性。
Sci Rep. 2024 Mar 24;14(1):6978. doi: 10.1038/s41598-024-57207-7.
2
Synthetic data in cancer and cerebrovascular disease research: A novel approach to big data.癌症和脑血管病研究中的合成数据:大数据的一种新方法。
PLoS One. 2024 Feb 7;19(2):e0295921. doi: 10.1371/journal.pone.0295921. eCollection 2024.
3
Synthetic Health Data Can Augment Community Research Efforts to Better Inform the Public During Emerging Pandemics.

本文引用的文献

1
The Use of Synthetic Electronic Health Record Data and Deep Learning to Improve Timing of High-Risk Heart Failure Surgical Intervention by Predicting Proximity to Catastrophic Decompensation.利用合成电子健康记录数据和深度学习通过预测接近灾难性失代偿来改善高危心力衰竭手术干预的时机
Front Digit Health. 2020 Dec 7;2:576945. doi: 10.3389/fdgth.2020.576945. eCollection 2020.
2
Spot the difference: comparing results of analyses from real patient data and synthetic derivatives.找出差异:比较来自真实患者数据和合成衍生物的分析结果。
JAMIA Open. 2020 Dec 14;3(4):557-566. doi: 10.1093/jamiaopen/ooaa060. eCollection 2020 Dec.
3
合成健康数据可助力社区研究工作,以便在新发大流行期间更好地为公众提供信息。
medRxiv. 2023 Dec 13:2023.12.11.23298687. doi: 10.1101/2023.12.11.23298687.
4
SynTwin: A graph-based approach for predicting clinical outcomes using digital twins derived from synthetic patients.SynTwin:一种基于图的方法,用于使用从合成患者中衍生的数字孪生体来预测临床结果。
Pac Symp Biocomput. 2024;29:96-107.
5
Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model.通过层次自回归语言模型合成高维纵向电子健康记录。
Nat Commun. 2023 Aug 31;14(1):5305. doi: 10.1038/s41467-023-41093-0.
6
Synthesize Extremely High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model.通过分层自回归语言模型合成超高维纵向电子健康记录。
Res Sq. 2023 Mar 10:rs.3.rs-2644725. doi: 10.21203/rs.3.rs-2644725/v1.
7
Validating a membership disclosure metric for synthetic health data.验证合成健康数据的成员披露指标。
JAMIA Open. 2022 Oct 11;5(4):ooac083. doi: 10.1093/jamiaopen/ooac083. eCollection 2022 Dec.
8
Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).展示一种评估合成地理空间和时间流行病学数据效用的方法:对美国国家 COVID 队列协作(N3C)中超过 180 万次 SARS-CoV-2 检测进行分析的结果。
J Am Med Inform Assoc. 2022 Jul 12;29(8):1350-1365. doi: 10.1093/jamia/ocac045.
9
Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: Results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).展示一种评估合成地理空间和时间流行病学数据效用的方法:美国国家新冠队列协作项目(N3C)中对超180万次新冠病毒检测分析的结果
medRxiv. 2021 Jul 8:2021.07.06.21259051. doi: 10.1101/2021.07.06.21259051.
Transmission dynamics: Data sharing in the COVID-19 era.
传播动力学:新冠疫情时代的数据共享
Learn Health Syst. 2020 Jun 28;5(1):e10235. doi: 10.1002/lrh2.10235. eCollection 2021 Jan.
4
The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.国家 COVID 队列协作组织(N3C):原理、设计、基础设施和部署。
J Am Med Inform Assoc. 2021 Mar 1;28(3):427-443. doi: 10.1093/jamia/ocaa196.
5
Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies.基于合成数据的医学研究结果分析及其与真实数据结果的关系:五项观察性研究的系统比较
JMIR Med Inform. 2020 Feb 20;8(2):e16492. doi: 10.2196/16492.
6
Are Synthetic Data Derivatives the Future of Translational Medicine?合成数据衍生物会是转化医学的未来吗?
JACC Basic Transl Sci. 2018 Nov 12;3(5):716-718. doi: 10.1016/j.jacbts.2018.08.007. eCollection 2018 Oct.