Schnorr Isabel, Andreas Stefanie, Schumann Linnea, Hahn Svenja, Vehreschild Jörg Janne, Maier Daniel
Faculty of Medicine, Institute for Digital Medicine and Clinical Data Sciences, Goethe University Frankfurt, Frankfurt, Germany.
Medical Department 2 (Hematology/Oncology and Infectious Diseases), Center for Internal Medicine, University Hospital, Goethe University Frankfurt, Frankfurt, Germany.
Sci Rep. 2025 Apr 10;15(1):12252. doi: 10.1038/s41598-025-97150-9.
Over the past decades, oncology treatment paradigms have developed significantly. Yet, the often unstructured nature of substance-related documentation in medical records presents a time-consuming challenge for analyzing treatment patterns and outcomes. To advance oncological research further, clinical data science must offer solutions that facilitate research and analysis with real-world data (RWD). The present contribution introduces a user-friendly R-tool designed to transform free-text medication entries into the structured Anatomical Therapeutic Chemical (ATC) Classification System by applying a dictionary-based approach. The resulting output is a structured data frame containing columns for antineoplastic medication, other medications, and supplementary information. For accuracy validation, 561 data entries from an evaluation data set were reviewed, consisting of 935 tokens. 88.5% of these tokens were successfully transformed into their respective ATC codes. Additional information was extracted from 129 data entries (23%), while 23 entries (4.1%) presented no usable information. All tokens underwent a manual review; 8.9% (84 tokens) failed transformations. This approach improves the standardization and analysis of systemic anti-cancer treatment data in German-speaking regions by optimizing efficiency while maintaining relevant accuracy.
在过去几十年中,肿瘤治疗模式有了显著发展。然而,病历中与药物相关的记录往往缺乏结构化,这给分析治疗模式和结果带来了耗时的挑战。为了进一步推动肿瘤学研究,临床数据科学必须提供有助于利用真实世界数据(RWD)进行研究和分析的解决方案。本文介绍了一种用户友好的R工具,该工具旨在通过应用基于字典的方法,将自由文本药物条目转换为结构化的解剖治疗化学(ATC)分类系统。生成的输出是一个结构化数据框,其中包含抗肿瘤药物、其他药物和补充信息的列。为了进行准确性验证,对评估数据集中的561条数据条目进行了审查,这些条目包含935个词元。其中88.5%的词元成功转换为各自的ATC代码。从129条数据条目(23%)中提取了额外信息,而23条条目(4.1%)没有提供可用信息。所有词元都经过了人工审查;8.9%(84个词元)转换失败。这种方法通过优化效率并保持相关准确性,提高了德语地区全身抗癌治疗数据的标准化和分析水平。