Suppr超能文献

利用法国开放政府数据优化健康数据仓库中癌症患者生命状态的检索

Optimizing the Retrieval of the Vital Status of Cancer Patients for Health Data Warehouses by Using Open Government Data in France.

机构信息

Analytics Department & Data Factory, Institut de Cancérologie de l'Ouest, F-44805 Nantes-Angers, France.

Oncology Department, Institut de Cancérologie de l'Ouest, F-44805 Nantes-Angers, France.

出版信息

Int J Environ Res Public Health. 2022 Apr 2;19(7):4272. doi: 10.3390/ijerph19074272.

Abstract

Electronic Medical Records (EMR) and Electronic Health Records (EHR) are often missing critical information about the death of a patient, although it is an essential metric for medical research in oncology to assess survival outcomes, particularly for evaluating the efficacy of new therapeutic approaches. We used open government data in France from 1970 to September 2021 to identify deceased patients and match them with patient data collected from the Institut de Cancérologie de l'Ouest (ICO) data warehouse (Integrated Center of Oncology-the third largest cancer center in France) between January 2015 and November 2021. To meet our objective, we evaluated algorithms to perform a deterministic record linkage: an exact matching algorithm and a fuzzy matching algorithm. Because we lacked reference data, we needed to assess the algorithms by estimating the number of homonyms that could lead to false links, using the same open dataset of deceased persons in France. The exact matching algorithm allowed us to double the number of dates of death in the ICO data warehouse, and the fuzzy matching algorithm tripled it. Studying homonyms assured us that there was a low risk of misidentification, with precision values of 99.96% for the exact matching and 99.68% for the fuzzy matching. However, estimating the number of false negatives proved more difficult than anticipated. Nevertheless, using open government data can be a highly interesting way to improve the completeness of the date of death variable for oncology patients in data warehouses.

摘要

电子病历 (EMR) 和电子健康记录 (EHR) 通常会遗漏患者死亡的关键信息,尽管对于评估肿瘤学中的生存结果的医学研究来说,这是一个重要的指标,特别是对于评估新治疗方法的疗效。我们使用法国从 1970 年到 2021 年 9 月的公开政府数据来识别死亡患者,并将其与 2015 年 1 月至 2021 年 11 月期间从 ICO 数据仓库(法国第三大癌症中心——西部肿瘤学综合中心)收集的患者数据进行匹配。为了实现我们的目标,我们评估了确定性记录链接算法:精确匹配算法和模糊匹配算法。由于我们缺乏参考数据,我们需要通过使用相同的法国公开死亡人员数据集来评估算法,以估计可能导致错误链接的同音字数量。精确匹配算法使我们能够将 ICO 数据仓库中死亡日期的数量增加一倍,模糊匹配算法将其增加三倍。研究同音字使我们确信,误识别的风险很低,精确匹配的精度值为 99.96%,模糊匹配的精度值为 99.68%。然而,估计假阴性的数量比预期的要困难。尽管如此,使用公开政府数据可以是一种非常有趣的方法,可以提高数据仓库中肿瘤患者死亡日期变量的完整性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a0d/8998644/7d52b59fc7e2/ijerph-19-04272-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验