Suppr超能文献

合成临床数据的有效性:使用临床质量指标对领先的合成数据生成器(Synthea)进行验证研究。

The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures.

机构信息

Clinical Informatics, Evolent Health, Arlington, USA.

Health Administration and Policy, George Mason University, 4400 University Drive, Fairfax, Virginia, 22030, USA.

出版信息

BMC Med Inform Decis Mak. 2019 Mar 14;19(1):44. doi: 10.1186/s12911-019-0793-0.

Abstract

BACKGROUND

Clinical data synthesis aims at generating realistic data for healthcare research, system implementation and training. It protects patient confidentiality, deepens our understanding of the complexity in healthcare, and is a promising tool for situations where real world data is difficult to obtain or unnecessary. However, its validity has not been fully examined, and no previous study has validated it from the perspective of healthcare quality, a critical aspect of a healthcare system. This study fills this gap by calculating clinical quality measures using synthetic data.

METHODS

We examined an open-source well-documented synthetic data generator Synthea, which was composed of the key advancements in this emerging technique. We selected a representative 1.2-million Massachusetts patient cohort generated by Synthea. Four quality measures, Colorectal Cancer Screening, Chronic Obstructive Pulmonary Disease (COPD) 30-Day Mortality, Rate of Complications after Hip/Knee Replacement, and Controlling High Blood Pressure, were selected based on clinical significance. Calculated rates were then compared with publicly reported rates based on real-world data of Massachusetts and United States.

RESULTS

Of the total Synthea Massachusetts population (n = 1,193,439), 394,476 were eligible for the "colorectal cancer screening" quality measure, and 248,433 (63%) were considered compliant, compared to the publicly reported Massachusetts and national rates being 77.3 and 69.8%, respectively. Of the 409 eligible patients, 0.7% of died within 30 days after COPD exacerbation, versus 7% reported in Massachusetts and 8% nationally. Using an expanded logic, this rate increased to 5.7%. No Synthea residents had complications after Hip/Knee Replacement (Massachusetts: 2.9%, national: 2.8%) or had their blood pressure controlled after being diagnosed with hypertension (Massachusetts: 74.52%, national: 69.7%). Results show that Synthea is quite reliable in modeling demographics and probabilities of services being offered in an average healthcare setting. However, its capabilities to model heterogeneous health outcomes post services are limited.

CONCLUSIONS

Synthea and other synthetic patient generators do not currently model for deviations in care and the potential outcomes that may result from care deviations. To output a more realistic data set, we propose that synthetic data generators should consider important quality measures in their logic and model when clinicians may deviate from standard practice.

摘要

背景

临床数据综合旨在为医疗保健研究、系统实施和培训生成现实数据。它保护患者的隐私,加深我们对医疗保健复杂性的理解,并且是在难以获得或不必要获得真实世界数据的情况下的一种很有前途的工具。然而,其有效性尚未得到充分检验,并且以前没有研究从医疗保健质量的角度对其进行验证,而医疗保健质量是医疗保健系统的关键方面。本研究通过使用合成数据计算临床质量措施来填补这一空白。

方法

我们检查了一个开源的、有详细文档记录的合成数据生成器 Synthea,它由这一新兴技术的关键进展组成。我们选择了 Synthea 生成的具有代表性的 120 万马萨诸塞州患者队列。选择了四个质量指标,即大肠癌筛查、慢性阻塞性肺疾病(COPD)30 天死亡率、髋关节/膝关节置换术后并发症发生率和高血压控制率,基于临床意义。然后,根据马萨诸塞州和美国的真实世界数据,将计算出的比率与公开报告的比率进行比较。

结果

在 Synthea 马萨诸塞州总人口(n=1193439)中,有 394476 人符合“大肠癌筛查”质量指标的条件,其中 248433 人(63%)被认为符合标准,而公开报告的马萨诸塞州和全国比率分别为 77.3%和 69.8%。在 409 名符合条件的患者中,有 0.7%的患者在 COPD 加重后 30 天内死亡,而马萨诸塞州报告的死亡率为 7%,全国为 8%。使用扩展逻辑,该比率增加到 5.7%。在髋关节/膝关节置换术后,没有 Synthea 居民出现并发症(马萨诸塞州:2.9%,全国:2.8%)或在被诊断患有高血压后血压得到控制(马萨诸塞州:74.52%,全国:69.7%)。结果表明,Synthea 非常可靠地模拟了平均医疗保健环境中的服务提供的人口统计学和服务提供概率。然而,它模拟服务后异质健康结果的能力有限。

结论

Synthea 和其他合成患者生成器目前无法对护理偏差和可能因护理偏差而产生的结果进行建模。为了输出更现实的数据集,我们建议合成数据生成器应在其逻辑中考虑重要的质量措施,并在医生可能偏离标准实践时进行建模。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8464/6416981/1fa733458224/12911_2019_793_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验