电子健康记录中药物数据结构化的评估与改进：算法开发与验证

Assessment and Improvement of Drug Data Structuredness From Electronic Health Records: Algorithm Development and Validation.

作者信息

Reinecke Ines, Siebel Joscha, Fuhrmann Saskia, Fischer Andreas, Sedlmayr Martin, Weidner Jens, Bathelt Franziska

机构信息

Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany.

Center for Evidence-Based Healthcare, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany.

出版信息

JMIR Med Inform. 2023 Jan 25;11:e40312. doi: 10.2196/40312.

DOI:10.2196/40312

PMID:36696159

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9909518/

Abstract

BACKGROUND

Digitization offers a multitude of opportunities to gain insights into current diagnostics and therapies from retrospective data. In this context, real-world data and their accessibility are of increasing importance to support unbiased and reliable research on big data. However, routinely collected data are not readily usable for research owing to the unstructured nature of health care systems and a lack of interoperability between these systems. This challenge is evident in drug data.

OBJECTIVE

This study aimed to present an approach that identifies and increases the structuredness of drug data while ensuring standardization according to Anatomical Therapeutic Chemical (ATC) classification.

METHODS

Our approach was based on available drug prescriptions and a drug catalog and consisted of 4 steps. First, we performed an initial analysis of the structuredness of local drug data to define a point of comparison for the effectiveness of the overall approach. Second, we applied 3 algorithms to unstructured data that translated text into ATC codes based on string comparisons in terms of ingredients and product names and performed similarity comparisons based on Levenshtein distance. Third, we validated the results of the 3 algorithms with expert knowledge based on the 1000 most frequently used prescription texts. Fourth, we performed a final validation to determine the increased degree of structuredness.

RESULTS

Initially, 47.73% (n=843,980) of 1,768,153 drug prescriptions were classified as structured. With the application of the 3 algorithms, we were able to increase the degree of structuredness to 85.18% (n=1,506,059) based on the 1000 most frequent medication prescriptions. In this regard, the combination of algorithms 1, 2, and 3 resulted in a correctness level of 100% (with 57,264 ATC codes identified), algorithms 1 and 3 resulted in 99.6% (with 152,404 codes identified), and algorithms 1 and 2 resulted in 95.9% (with 39,472 codes identified).

CONCLUSIONS

As shown in the first analysis steps of our approach, the availability of a product catalog to select during the documentation process is not sufficient to generate structured data. Our 4-step approach reduces the problems and reliably increases the structuredness automatically. Similarity matching shows promising results, particularly for entries with no connection to a product catalog. However, further enhancement of the correctness of such a similarity matching algorithm needs to be investigated in future work.

摘要

背景

数字化为从回顾性数据中深入了解当前的诊断和治疗方法提供了众多机会。在这种背景下，真实世界数据及其可获取性对于支持大数据的无偏且可靠的研究变得越来越重要。然而，由于医疗保健系统的非结构化性质以及这些系统之间缺乏互操作性，常规收集的数据不易用于研究。这一挑战在药物数据中尤为明显。

目的

本研究旨在提出一种方法，该方法可识别并提高药物数据的结构化程度，同时确保根据解剖治疗化学（ATC）分类进行标准化。

方法

我们的方法基于可用的药物处方和药物目录，包括4个步骤。首先，我们对本地药物数据的结构化程度进行初步分析，以确定整体方法有效性的比较点。其次，我们将3种算法应用于非结构化数据，这些算法根据成分和产品名称的字符串比较将文本转换为ATC代码，并基于莱文斯坦距离进行相似性比较。第三，我们基于1000个最常用的处方文本，用专家知识验证了这3种算法的结果。第四，我们进行了最终验证，以确定结构化程度的提高程度。

结果

最初，1,768,153份药物处方中有47.73%（n = 843,980）被分类为结构化。通过应用这3种算法，基于1000个最常见的药物处方，我们能够将结构化程度提高到85.18%（n = 1,506,059）。在这方面，算法1、2和3的组合产生了100%的正确水平（识别出57,264个ATC代码），算法1和3产生了99.6%（识别出152,404个代码），算法1和2产生了95.9%（识别出39,472个代码）。

结论

正如我们方法的第一个分析步骤所示，在文档编制过程中可供选择的产品目录不足以生成结构化数据。我们的4步方法减少了问题并可靠地自动提高了结构化程度。相似性匹配显示出有希望的结果，特别是对于与产品目录无关的条目。然而，这种相似性匹配算法的正确性的进一步提高需要在未来的工作中进行研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d06b/9909518/10c803de989d/medinform_v11i1e40312_fig1.jpg

相似文献

Assessment and Improvement of Drug Data Structuredness From Electronic Health Records: Algorithm Development and Validation.电子健康记录中药物数据结构化的评估与改进：算法开发与验证

JMIR Med Inform. 2023 Jan 25;11:e40312. doi: 10.2196/40312.

Pharmaceutical Feedback Loop - A Concept to Improve Prescription Safety and Data Quality.药品反馈循环——提高处方安全性和数据质量的概念。

Stud Health Technol Inform. 2022 Aug 31;298:73-77. doi: 10.3233/SHTI220910.

Fitness for Use of Anatomical Therapeutic Classification for Real World Data Research.适合于使用解剖治疗学分类进行真实世界数据研究。

Stud Health Technol Inform. 2023 May 18;302:711-715. doi: 10.3233/SHTI230245.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

[Exploration and example interpretation of real-world herbal prescription classification based on similarity matching algorithm].基于相似度匹配算法的真实世界中药方剂分类探索与实例解读

Zhongguo Zhong Yao Za Zhi. 2023 Feb;48(4):1132-1136. doi: 10.19540/j.cnki.cjcmm.20221027.501.

Network predicting drug's anatomical therapeutic chemical code.网络预测药物的解剖治疗化学编码。

Bioinformatics. 2013 May 15;29(10):1317-24. doi: 10.1093/bioinformatics/btt158. Epub 2013 Apr 5.

Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization.多视图不完整知识图集成及其在跨机构电子健康记录数据协调中的应用。

J Biomed Inform. 2022 Sep;133:104147. doi: 10.1016/j.jbi.2022.104147. Epub 2022 Jul 21.

SIAP: an intelligent algorithm for multiple prescription pattern recognition based on weighted similarity distances.基于加权相似距离的多处方模式识别智能算法（SIAP）。

BMC Med Inform Decis Mak. 2023 May 4;23(1):79. doi: 10.1186/s12911-023-02141-3.

Analyzing U.S. prescription lists with RxNorm and the ATC/DDD Index.使用RxNorm和ATC/DDD索引分析美国处方清单。

AMIA Annu Symp Proc. 2014 Nov 14;2014:297-306. eCollection 2014.

A data extraction algorithm for assessment of contraceptive counseling and provision.一种用于评估避孕咨询和提供服务的数据分析算法。

Am J Obstet Gynecol. 2018 Mar;218(3):333.e1-333.e5. doi: 10.1016/j.ajog.2017.11.578. Epub 2017 Nov 23.

引用本文的文献

Evaluating and Enhancing the Fitness-for-Purpose of Electronic Health Record Data: Qualitative Study on Current Practices and Pathway to an Automated Approach Within the Medical Informatics for Research and Care in University Medicine Consortium.评估和提高电子健康记录数据的适用性：大学医学联合会研究与护理医学信息学中当前实践及自动化方法途径的定性研究

JMIR Med Inform. 2024 Aug 19;12:e57153. doi: 10.2196/57153.

Creating a Medication Therapy Observational Research Database from an Electronic Medical Record: Challenges and Data Curation.从电子病历中创建药物治疗观察研究数据库：挑战与数据管理。

Appl Clin Inform. 2024 Jan;15(1):111-118. doi: 10.1055/s-0043-1777741. Epub 2024 Feb 7.

本文引用的文献

Availability of Structured Data Elements in Electronic Health Records for Supporting Patient Recruitment in Clinical Trials.电子健康记录中结构化数据元素的可用性，以支持临床试验中的患者招募。

Stud Health Technol Inform. 2022 Jun 6;290:130-134. doi: 10.3233/SHTI220046.

Broadening the reach of the FDA Sentinel system: A roadmap for integrating electronic health record data in a causal analysis framework.拓展美国食品药品监督管理局哨兵系统的覆盖范围：在因果分析框架中整合电子健康记录数据的路线图。

NPJ Digit Med. 2021 Dec 20;4(1):170. doi: 10.1038/s41746-021-00542-0.

The Usage of OHDSI OMOP - A Scoping Review.OHDSI OMOP 的使用 - 范围综述。

Stud Health Technol Inform. 2021 Sep 21;283:95-103. doi: 10.3233/SHTI210546.

Drawing Reproducible Conclusions from Observational Clinical Data with OHDSI.从 OHDSI 观察性临床数据中得出可重现的结论。

Yearb Med Inform. 2021 Aug;30(1):283-289. doi: 10.1055/s-0041-1726481. Epub 2021 Apr 21.

Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R.促进数据质量评估的协调一致。具有 R 软件实现的观察性健康研究数据收集的数据质量框架。

BMC Med Res Methodol. 2021 Apr 2;21(1):63. doi: 10.1186/s12874-021-01252-7.

A COVID-19-ready public health surveillance system: The Food and Drug Administration's Sentinel System.一个为应对 COVID-19 而准备的公共卫生监测系统：美国食品药品监督管理局的 Sentinel 系统。

Pharmacoepidemiol Drug Saf. 2021 Jul;30(7):827-837. doi: 10.1002/pds.5240. Epub 2021 Apr 18.

A Rule-Based Data Quality Assessment System for Electronic Health Record Data.基于规则的数据质量评估系统在电子健康记录数据中的应用。

Appl Clin Inform. 2020 Aug;11(4):622-634. doi: 10.1055/s-0040-1715567. Epub 2020 Sep 23.

A Call to Action to Track Generic Drug Quality Using Real-World Data and the FDA's Sentinel Initiative.呼吁利用真实世界数据和 FDA 的 Sentinel 计划跟踪仿制药质量。

J Manag Care Spec Pharm. 2020 Aug;26(8):1050. doi: 10.18553/jmcp.2020.26.8.1050.

The bird's-eye view: A data-driven approach to understanding patient journeys from claims data.鸟瞰图：一种从索赔数据中了解患者就医流程的基于数据的方法。

J Am Med Inform Assoc. 2020 Jul 1;27(7):1037-1045. doi: 10.1093/jamia/ocaa052.

Different Strategies to Execute Multi-Database Studies for Medicines Surveillance in Real-World Setting: A Reflection on the European Model.多数据库研究在真实世界环境下进行药物监测的不同策略：对欧洲模式的反思。

Clin Pharmacol Ther. 2020 Aug;108(2):228-235. doi: 10.1002/cpt.1833. Epub 2020 May 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

电子健康记录中药物数据结构化的评估与改进：算法开发与验证

Assessment and Improvement of Drug Data Structuredness From Electronic Health Records: Algorithm Development and Validation.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献