多医院系统中医疗研究综合大数据的质量保证。

Quality assurance of integrative big data for medical research within a multihospital system.

机构信息

Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan; Department of Internal Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan; Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan.

Integrative Medical Database Center, Department of Medical Research, National Taiwan University Hospital, Taipei, Taiwan; Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan.

出版信息

J Formos Med Assoc. 2022 Sep;121(9):1728-1738. doi: 10.1016/j.jfma.2021.12.024. Epub 2022 Feb 12.

DOI:10.1016/j.jfma.2021.12.024

PMID:35168836

Abstract

BACKGROUND

The need is growing to create medical big data based on the electronic health records collected from different hospitals. Errors for sure occur and how to correct them should be explored.

METHODS

Electronic health records of 9,197,817 patients and 53,081,148 visits, totaling about 500 million records for 2006-2016, were transmitted from eight hospitals into an integrated database. We randomly selected 10% of patients, accumulated the primary keys for their tabulated data, and compared the key numbers in the transmitted data with those of the raw data. Errors were identified based on statistical testing and clinical reasoning.

RESULTS

Data were recorded in 1573 tables. Among these, 58 (3.7%) had different key numbers, with the maximum of 16.34/1000. Statistical differences (P < 0.05) were found in 34 (58.6%), of which 15 were caused by changes in diagnostic codes, wrong accounts, or modified orders. For the rest, the differences were related to accumulation of hospital visits over time. In the remaining 24 tables (41.4%) without significant differences, three were revised because of incorrect computer programming or wrong accounts. For the rest, the programming was correct and absolute differences were negligible. The applicability was confirmed using the data of 2,730,883 patients and 15,647,468 patient-visits transmitted during 2017-2018, in which 10 (3.5%) tables were corrected.

CONCLUSION

Significant magnitude of inconsistent data does exist during the transmission of big data from diverse sources. Systematic validation is essential. Comparing the number of data tabulated using the primary keys allow us to rapidly identify and correct these scattered errors.

摘要

背景

基于从不同医院收集的电子健康记录创建医疗大数据的需求日益增长。错误肯定会发生，应该探索如何纠正这些错误。

方法

将来自 8 家医院的 9197817 名患者和 53081148 次就诊的电子健康记录传输到一个综合数据库中，这些记录总计约 2006-2016 年的 5 亿条记录。我们随机选择了 10%的患者，积累了他们表格数据的主键，并将传输数据中的关键号码与原始数据进行比较。基于统计检验和临床推理识别错误。

结果

数据记录在 1573 个表中。其中，有 58 个（3.7%）表的关键号码不同，最大差值为 16.34/1000。在 34 个（58.6%）表中发现了统计学差异（P < 0.05），其中 15 个是由于诊断代码变化、错误账户或修改医嘱引起的。其余的则与随着时间的推移医院就诊次数的累积有关。在其余 24 个（41.4%）无显著差异的表中，有 3 个由于不正确的计算机编程或错误账户而被修改。其余的编程是正确的，绝对差异可以忽略不计。使用 2017-2018 年传输的 2730883 名患者和 15647468 名患者就诊的数据进行了适用性验证，其中有 10 个（3.5%）表得到了修正。

结论

从不同来源传输大数据时确实存在不一致数据的显著幅度。系统验证是必要的。使用主键比较数据表格的数量可以快速识别和纠正这些分散的错误。

相似文献

Quality assurance of integrative big data for medical research within a multihospital system.

J Formos Med Assoc. 2022 Sep;121(9):1728-1738. doi: 10.1016/j.jfma.2021.12.024. Epub 2022 Feb 12.

Role of multihospital system membership in electronic medical record adoption.

Health Care Manage Rev. 2008 Apr-Jun;33(2):169-77. doi: 10.1097/01.HMR.0000304502.20179.32.

Effect of Restriction of the Number of Concurrently Open Records in an Electronic Health Record on Wrong-Patient Order Errors: A Randomized Clinical Trial.

JAMA. 2019 May 14;321(18):1780-1787. doi: 10.1001/jama.2019.3698.

Administrative Data Use in National Registry Efforts: Blessing or Curse?

J Bone Joint Surg Am. 2022 Oct 19;104(Suppl 3):39-46. doi: 10.2106/JBJS.22.00565.

Concurrence of big data analytics and healthcare: A systematic review.

Int J Med Inform. 2018 Jun;114:57-65. doi: 10.1016/j.ijmedinf.2018.03.013. Epub 2018 Mar 26.

Development and implementation of a dynamically updated big data intelligence platform from electronic health records for nasopharyngeal carcinoma research.

Br J Radiol. 2019 Oct;92(1102):20190255. doi: 10.1259/bjr.20190255. Epub 2019 Aug 20.

A systematic quality assurance framework for the upgrade of radiation oncology information systems.

Phys Med. 2020 Jan;69:28-35. doi: 10.1016/j.ejmp.2019.11.024. Epub 2019 Dec 5.

Applications of Artificial Intelligence and Big Data Analytics in m-Health: A Healthcare System Perspective.

J Healthc Eng. 2020 Aug 30;2020:8894694. doi: 10.1155/2020/8894694. eCollection 2020.

Impact of big data on oral health outcomes.

Oral Dis. 2019 Jul;25(5):1245-1252. doi: 10.1111/odi.13007. Epub 2018 Dec 28.

A method for cohort selection of cardiovascular disease records from an electronic health record system.

Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.

引用本文的文献

Clinical and Economic Evaluation of a Real-Time Chest X-Ray Computer-Aided Detection System for Misplaced Endotracheal and Nasogastric Tubes and Pneumothorax in Emergency and Critical Care Settings: Protocol for a Cluster Randomized Controlled Trial.

JMIR Res Protoc. 2025 Aug 20;14:e72928. doi: 10.2196/72928.

Subtypes of Intracranial Carotid Arteriosclerosis and Vascular Prognosis in Chronic Kidney Disease Patients.

Kidney Dis (Basel). 2025 Jun 10;11(1):508-517. doi: 10.1159/000546853. eCollection 2025 Jan-Dec.

Exploring gender-specific prognostic factors and survival outcomes in oral squamous cell carcinoma: Insights from a Taiwanese cohort.

J Dent Sci. 2025 Jul;20(3):1832-1842. doi: 10.1016/j.jds.2025.04.026. Epub 2025 May 9.

Interpretable Independent Recurrent Networks for Forecasting Stroke in Atrial Fibrillation.

JACC Asia. 2025 Aug;5(8):966-978. doi: 10.1016/j.jacasi.2025.04.003. Epub 2025 Jun 10.

Development and validation of a five-year cardiovascular risk assessment tool for Asian adults aged 75 years and older.

BMC Geriatr. 2025 Jan 8;25(1):15. doi: 10.1186/s12877-024-05660-4.

High SAFE scores predict hepatocellular carcinoma in viral and non-viral hepatitis and metabolic dysfunction associated steatotic liver disease.

Clin Mol Hepatol. 2025 Jan 6. doi: 10.3350/cmh.2024.0822.

Effect of body mass index on mortality for diabetic patients with aortic stenosis.

Aging (Albany NY). 2024 Jul 24;16(14):11359-11372. doi: 10.18632/aging.206018.

Prediabetes increases the risk of major limb and cardiovascular events.

Cardiovasc Diabetol. 2023 Dec 19;22(1):348. doi: 10.1186/s12933-023-02085-y.

A 20-year study of autoimmune polyendocrine syndrome type II and III in Taiwan.

Eur Thyroid J. 2023 Nov 23;12(6). doi: 10.1530/ETJ-23-0162. Print 2023 Dec 1.

Distinct effects of hepatic steatosis and metabolic dysfunction on the risk of hepatocellular carcinoma in chronic hepatitis B.

Hepatol Int. 2023 Oct;17(5):1139-1149. doi: 10.1007/s12072-023-10545-6. Epub 2023 May 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

多医院系统中医疗研究综合大数据的质量保证。

Quality assurance of integrative big data for medical research within a multihospital system.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献