Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Im Neuenheimer Feld 410, 69120, Heidelberg, Germany; Cooperation Unit Clinical Pharmacy, University of Heidelberg, Im Neuenheimer Feld 410, 69120, Heidelberg, Germany.
datapeutics GmbH, Hans-Bunte-Straße 8-10, 69123, Heidelberg, Germany.
Int J Med Inform. 2020 Jan;133:103970. doi: 10.1016/j.ijmedinf.2019.103970. Epub 2019 Sep 16.
The drug information most commonly requested by patients is to learn more about potential adverse drug reactions (ADRs) of their drugs. Such information should be customizable to individual information needs. While approaches to automatically aggregate ADRs by text-mining processes and establishment of respective databases are well known, further efforts to map additional ADR information are sparse, yet crucial for customization. In a proof-of-principle (PoP) study, we developed a database format demonstrating that natural language processing can further structure ADR information in a way that facilitates customization.
We developed the database in a 3-step process: (1) initial ADR extraction, (2) mapping of additional ADR information, and (3) review process. ADRs of 10 frequently prescribed active ingredients were initially extracted from their Summary of Product Characteristics (SmPC) by text-mining processes and mapped to Medical Dictionary for Regulatory Activities (MedDRA) terms. To further structure ADR information, we mapped 7 additional ADR characteristics (i.e. frequency, organ class, seriousness, lay perceptibility, onset, duration, and management strategies) to individual ADRs. In a PoP study, the process steps were assessed and tested. Initial ADR extraction was assessed by measuring precision, recall, and F-scores (i.e. harmonic mean of precision and recall). Mapping of additional ADR information was assessed considering pre-defined parameters (i.e. correctness, errors, and misses) regarding the mapped ADR characteristics.
Overall the SmPCs listed 393 ADRs with an average of 39.3 ± 18.1 ADRs per SmPC. For initial ADR extraction precision was 97.9% and recall was 93.2% leading to an F-score of 95.5%. Regarding mapping of additional ADR information, the frequency information of 28.6 ± 18.4 ADRs for each SmPC was correctly mapped (72.8%). Overall 77 ADRs (20.6%) of the correctly extracted ADRs did not have a concise frequency stated in the SmPC and were consequently mapped with 'frequency not known'. Mapping of remaining ADR characteristics did not result in noteworthy errors or misses.
ADR information can be automatically extracted and mapped to corresponding MedDRA terms. Additionally, ADR information can be further structured considering additional ADR characteristics to facilitate customization to individual patient needs.
患者最常要求获取的药物信息是了解更多关于潜在药物不良反应(ADR)的信息。这些信息应该可以根据个人信息需求进行定制。虽然通过文本挖掘过程和建立相应数据库来自动汇总 ADR 的方法是众所周知的,但进一步努力映射其他 ADR 信息仍然很少,但对于定制化至关重要。在原理验证(PoP)研究中,我们开发了一种数据库格式,证明自然语言处理可以进一步以促进定制化的方式对 ADR 信息进行结构化。
我们通过 3 个步骤开发了数据库:(1)初始 ADR 提取,(2)附加 ADR 信息映射,和(3)审查过程。通过文本挖掘过程从 10 种常用处方药的产品特性摘要(SmPC)中初始提取 ADR,并将其映射到监管活动医学词典(MedDRA)术语。为了进一步结构化 ADR 信息,我们将 7 种附加 ADR 特征(即频率、器官类别、严重程度、普通感知、发作、持续时间和管理策略)映射到各个 ADR。在 PoP 研究中,评估和测试了各个步骤。通过测量精度、召回率和 F 分数(即精度和召回率的调和平均值)评估初始 ADR 提取。考虑到映射 ADR 特征的预定义参数(即正确性、错误和遗漏),评估附加 ADR 信息的映射。
总体而言,SmPC 列出了 393 个 ADR,每个 SmPC 的平均 ADR 为 39.3±18.1 个。对于初始 ADR 提取,精度为 97.9%,召回率为 93.2%,F 分数为 95.5%。关于附加 ADR 信息的映射,每个 SmPC 的 28.6±18.4 个 ADR 的频率信息被正确映射(72.8%)。总体而言,正确提取的 393 个 ADR 中有 77 个(20.6%)没有在 SmPC 中明确说明频率,因此被映射为“频率未知”。映射其他 ADR 特征没有导致显著的错误或遗漏。
可以自动提取 ADR 信息并将其映射到相应的 MedDRA 术语。此外,可以考虑附加的 ADR 特征进一步结构化 ADR 信息,以方便根据患者个体需求进行定制化。