• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ATCodeR:一种基于字典的用于规范药物自由文本的R工具。

ATCodeR: a dictionary-based R-tool to standardize medication free-text.

作者信息

Schnorr Isabel, Andreas Stefanie, Schumann Linnea, Hahn Svenja, Vehreschild Jörg Janne, Maier Daniel

机构信息

Faculty of Medicine, Institute for Digital Medicine and Clinical Data Sciences, Goethe University Frankfurt, Frankfurt, Germany.

Medical Department 2 (Hematology/Oncology and Infectious Diseases), Center for Internal Medicine, University Hospital, Goethe University Frankfurt, Frankfurt, Germany.

出版信息

Sci Rep. 2025 Apr 10;15(1):12252. doi: 10.1038/s41598-025-97150-9.

DOI:10.1038/s41598-025-97150-9
PMID:40211013
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11986091/
Abstract

Over the past decades, oncology treatment paradigms have developed significantly. Yet, the often unstructured nature of substance-related documentation in medical records presents a time-consuming challenge for analyzing treatment patterns and outcomes. To advance oncological research further, clinical data science must offer solutions that facilitate research and analysis with real-world data (RWD). The present contribution introduces a user-friendly R-tool designed to transform free-text medication entries into the structured Anatomical Therapeutic Chemical (ATC) Classification System by applying a dictionary-based approach. The resulting output is a structured data frame containing columns for antineoplastic medication, other medications, and supplementary information. For accuracy validation, 561 data entries from an evaluation data set were reviewed, consisting of 935 tokens. 88.5% of these tokens were successfully transformed into their respective ATC codes. Additional information was extracted from 129 data entries (23%), while 23 entries (4.1%) presented no usable information. All tokens underwent a manual review; 8.9% (84 tokens) failed transformations. This approach improves the standardization and analysis of systemic anti-cancer treatment data in German-speaking regions by optimizing efficiency while maintaining relevant accuracy.

摘要

在过去几十年中,肿瘤治疗模式有了显著发展。然而,病历中与药物相关的记录往往缺乏结构化,这给分析治疗模式和结果带来了耗时的挑战。为了进一步推动肿瘤学研究,临床数据科学必须提供有助于利用真实世界数据(RWD)进行研究和分析的解决方案。本文介绍了一种用户友好的R工具,该工具旨在通过应用基于字典的方法,将自由文本药物条目转换为结构化的解剖治疗化学(ATC)分类系统。生成的输出是一个结构化数据框,其中包含抗肿瘤药物、其他药物和补充信息的列。为了进行准确性验证,对评估数据集中的561条数据条目进行了审查,这些条目包含935个词元。其中88.5%的词元成功转换为各自的ATC代码。从129条数据条目(23%)中提取了额外信息,而23条条目(4.1%)没有提供可用信息。所有词元都经过了人工审查;8.9%(84个词元)转换失败。这种方法通过优化效率并保持相关准确性,提高了德语地区全身抗癌治疗数据的标准化和分析水平。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9daf/11986091/7545df2e6163/41598_2025_97150_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9daf/11986091/e1bdc58f1804/41598_2025_97150_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9daf/11986091/2412c306b8ae/41598_2025_97150_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9daf/11986091/7545df2e6163/41598_2025_97150_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9daf/11986091/e1bdc58f1804/41598_2025_97150_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9daf/11986091/2412c306b8ae/41598_2025_97150_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9daf/11986091/7545df2e6163/41598_2025_97150_Fig3_HTML.jpg

相似文献

1
ATCodeR: a dictionary-based R-tool to standardize medication free-text.ATCodeR:一种基于字典的用于规范药物自由文本的R工具。
Sci Rep. 2025 Apr 10;15(1):12252. doi: 10.1038/s41598-025-97150-9.
2
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
3
Patient free text reporting of symptomatic adverse events in cancer clinical research using the National Cancer Institute's Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE).使用美国国家癌症研究所的患者报告结局版常见不良事件术语标准(PRO-CTCAE),对癌症临床研究中的症状性不良事件进行患者自由文本报告。
J Am Med Inform Assoc. 2019 Apr 1;26(4):276-285. doi: 10.1093/jamia/ocy169.
4
Manual versus automated coding of free-text self-reported medication data in the 45 and Up Study: a validation study.“45岁及以上研究”中自由文本自我报告用药数据的人工编码与自动编码:一项验证研究
Public Health Res Pract. 2015 Mar 30;25(2):e2521518. doi: 10.17061/phrp2521518.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
An Electronic Health Record Text Mining Tool to Collect Real-World Drug Treatment Outcomes: A Validation Study in Patients With Metastatic Renal Cell Carcinoma.电子健康记录文本挖掘工具收集真实世界药物治疗结局:转移性肾细胞癌患者的验证研究。
Clin Pharmacol Ther. 2020 Sep;108(3):644-652. doi: 10.1002/cpt.1966. Epub 2020 Jul 18.
7
Enhancing Transparency in Defining Studied Drugs: The Open-Source Living DiAna Dictionary for Standardizing Drug Names in the FAERS.增强研究药物定义的透明度:FAERS 中用于标准化药物名称的开源活体 DiAna 字典。
Drug Saf. 2024 Mar;47(3):271-284. doi: 10.1007/s40264-023-01391-4. Epub 2024 Jan 4.
8
Assessment and Improvement of Drug Data Structuredness From Electronic Health Records: Algorithm Development and Validation.电子健康记录中药物数据结构化的评估与改进:算法开发与验证
JMIR Med Inform. 2023 Jan 25;11:e40312. doi: 10.2196/40312.
9
Utility analysis and demonstration of real-world clinical texts: A case study on Japanese cancer-related EHRs.实用分析与真实临床文本论证:以日本癌症相关电子病历为例
PLoS One. 2024 Sep 11;19(9):e0310432. doi: 10.1371/journal.pone.0310432. eCollection 2024.
10
Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.使用自然语言处理方法从自由文本和非结构化患者生成的健康数据中提取医学信息:基于真实世界数据的可行性研究
JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.

本文引用的文献

1
Enhancing Transparency in Defining Studied Drugs: The Open-Source Living DiAna Dictionary for Standardizing Drug Names in the FAERS.增强研究药物定义的透明度:FAERS 中用于标准化药物名称的开源活体 DiAna 字典。
Drug Saf. 2024 Mar;47(3):271-284. doi: 10.1007/s40264-023-01391-4. Epub 2024 Jan 4.
2
Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review.健康研究中数字非结构化数据充实的挑战与最佳实践:一项系统性叙述性综述
PLOS Digit Health. 2023 Oct 11;2(10):e0000347. doi: 10.1371/journal.pdig.0000347. eCollection 2023 Oct.
3
Innovation in cancer therapeutics and regulatory perspectives.
癌症治疗的创新与监管视角。
Med Oncol. 2022 Feb 23;39(5):76. doi: 10.1007/s12032-022-01677-0.
4
The growing role of precision and personalized medicine for cancer treatment.精准医学和个性化医学在癌症治疗中日益重要的作用。
Technology (Singap World Sci). 2018 Sep-Dec;6(3-4):79-100. doi: 10.1142/S2339547818300020. Epub 2019 Jan 11.
5
Treatment complexity: a description of chemotherapy and supportive care treatment visits in patients with advanced-stage cancer diagnoses.治疗复杂性:对晚期癌症诊断患者化疗及支持性护理就诊情况的描述。
Support Care Cancer. 2016 Jan;24(1):285-293. doi: 10.1007/s00520-015-2775-9. Epub 2015 May 31.
6
Assessment of the evolution of cancer treatment therapies.癌症治疗疗法的演变评估。
Cancers (Basel). 2011 Aug 12;3(3):3279-330. doi: 10.3390/cancers3033279.