利用文本挖掘方法提高药物暴露计算效率以支持药物流行病学研究。

Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies.

机构信息

Public Health and Intelligence Strategic Business Unit, NHS National Services Scotland, Edinburgh, UK.

Farr Institute of Health Informatics Research, Edinburgh, UK.

出版信息

Int J Epidemiol. 2018 Apr 1;47(2):617-624. doi: 10.1093/ije/dyx264.

DOI:10.1093/ije/dyx264

PMID:29420741

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5913611/

Abstract

BACKGROUND

Efficient generation of structured dose instructions that enable researchers to calculate drug exposure is central to pharmacoepidemiology studies. Our aim was to design and test an algorithm to codify dose instructions, applied to the NHS Scotland Prescribing Information System (PIS) that records about 100 million prescriptions per annum.

METHODS

A natural language processing (NLP) algorithm was developed that enabled free-text dose instructions to be represented by three attributes - quantity, frequency and qualifier - specified by three, three and two variables, respectively. A sample of 15 593 distinct dose instructions was used to test, validate and refine the algorithm. The final algorithm used a zero-assumption approach and was then applied to the full dataset.

RESULTS

The initial algorithm generated structured output for 13 152 (84.34%) of the 15 593 sample dose instructions, and reviewers identified 767 (5.83%) incorrect translations, giving an accuracy of 94.17%. Following subsequent refinement of the algorithm rules, application to the full dataset of 458 227 687 prescriptions (99.67% had dose instructions represented by 4 964 083 distinct instructions) generated a structured output for 92.3% of dose instruction texts. This varied by therapeutic area (from 86.7% for the central nervous system to 96.8% for the cardiovascular system).

CONCLUSIONS

We created an NLP algorithm, operational at scale, to produce structured output that gives data users maximum flexibility to formulate, test and apply their own assumptions according to the medicines under investigation. Text mining approaches can provide a solution to the safe and efficient management and provisioning of large volumes of data generated through our health systems.

摘要

背景

高效生成结构化剂量说明，使研究人员能够计算药物暴露量，这是药物流行病学研究的核心。我们的目的是设计和测试一种算法，对苏格兰国民保健系统处方信息系统（PIS）中记录的约 1 亿份处方进行编码，以记录剂量说明。

方法

开发了一种自然语言处理（NLP）算法，使自由文本剂量说明能够用三个属性来表示，分别是数量、频率和限定词，每个属性分别由三个、三个和两个变量来指定。该算法用 15593 个不同的剂量说明样本来测试、验证和改进。最终算法采用零假设方法，然后应用于整个数据集。

结果

初始算法生成了 13152 个（84.34%）样本剂量说明的结构化输出，审核人员确定了 767 个（5.83%）错误翻译，准确率为 94.17%。在随后对算法规则进行改进后，将其应用于包含 458227687 个处方的整个数据集（99.67%的处方有剂量说明，由 4964083 个不同的说明组成），生成了 92.3%的剂量说明文本的结构化输出。这因治疗领域而异（中枢神经系统为 86.7%，心血管系统为 96.8%）。

结论

我们创建了一种自然语言处理算法，可以大规模生成结构化输出，为数据用户提供最大的灵活性，根据所研究的药物来制定、测试和应用自己的假设。文本挖掘方法可以为安全有效地管理和提供我们医疗系统生成的大量数据提供解决方案。

相似文献

Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies.利用文本挖掘方法提高药物暴露计算效率以支持药物流行病学研究。

Int J Epidemiol. 2018 Apr 1;47(2):617-624. doi: 10.1093/ije/dyx264.

"Take up to eight tablets per day": Incorporating free-text medication instructions into a transparent and reproducible process for preparing drug exposure data for pharmacoepidemiology.“每日最多服用八片”：将自由文本药物说明纳入透明且可重复的药物暴露数据制备流程，以用于药物流行病学研究。

Pharmacoepidemiol Drug Saf. 2023 Jun;32(6):651-660. doi: 10.1002/pds.5595. Epub 2023 Feb 11.

An algorithm to derive a numerical daily dose from unstructured text dosage instructions.一种从非结构化文本剂量说明中得出每日数值剂量的算法。

Pharmacoepidemiol Drug Saf. 2006 Mar;15(3):161-6. doi: 10.1002/pds.1151.

Improving prescribing through big data approaches-Ten years of the Scottish Prescribing Information System.通过大数据方法改善处方开具——苏格兰处方信息系统十年

Br J Clin Pharmacol. 2020 Feb;86(2):250-257. doi: 10.1111/bcp.14184. Epub 2020 Jan 17.

Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道，使用自然语言处理提取和标准化非结构化临床数据。

Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.

Automatic Extraction of Medication Data from Semi-Structured Prescriptions.半自动处方中药物数据的自动提取。

Stud Health Technol Inform. 2024 Aug 22;316:1694-1698. doi: 10.3233/SHTI240749.

Mining peripheral arterial disease cases from narrative clinical notes using natural language processing.使用自然语言处理技术从叙述性临床记录中挖掘外周动脉疾病病例。

J Vasc Surg. 2017 Jun;65(6):1753-1761. doi: 10.1016/j.jvs.2016.11.031. Epub 2017 Feb 8.

Inventory of tools for Dutch clinical language processing.荷兰临床语言处理工具清单。

Stud Health Technol Inform. 2012;180:245-9.

Modelling and extraction of variability in free-text medication prescriptions from an anonymised primary care electronic medical record research database.从匿名的基层医疗电子病历研究数据库中对自由文本药物处方的变异性进行建模与提取。

BMC Med Inform Decis Mak. 2016 Feb 9;16:18. doi: 10.1186/s12911-016-0255-x.

Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.电子健康记录中自由文本叙述的症状的自然语言处理：系统评价。

J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379. doi: 10.1093/jamia/ocy173.

引用本文的文献

Derivation of asthma severity from electronic prescription records using British thoracic society treatment steps.基于英国胸科学会治疗步骤从电子处方记录推导哮喘严重程度。

BMC Pulm Med. 2022 Nov 3;22(1):397. doi: 10.1186/s12890-022-02189-3.

Measurement error and misclassification in electronic medical records: methods to mitigate bias.电子病历中的测量误差和错误分类：减轻偏差的方法。

Curr Epidemiol Rep. 2018 Dec;5(4):343-356. doi: 10.1007/s40471-018-0164-x. Epub 2018 Sep 10.

The use of narrative electronic prescribing instructions in pharmacoepidemiology: A scoping review for the International Society for Pharmacoepidemiology.叙述性电子处方说明在药物流行病学中的应用：国际药物流行病学学会的一项范围综述

Pharmacoepidemiol Drug Saf. 2021 Oct;30(10):1281-1292. doi: 10.1002/pds.5331. Epub 2021 Jul 28.

Prediction of treatment dosage and duration from free-text prescriptions: an application to ADHD medications in the Swedish prescribed drug register.从自由文本处方预测治疗剂量和持续时间：在瑞典处方药物登记处对 ADHD 药物的应用。

Evid Based Ment Health. 2021 Nov;24(4):146-152. doi: 10.1136/ebmental-2020-300231. Epub 2021 Apr 1.

Identifying naloxone administrations in electronic health record data using a text-mining tool.使用文本挖掘工具在电子健康记录数据中识别纳洛酮给药。

Subst Abus. 2021;42(4):806-812. doi: 10.1080/08897077.2020.1856288. Epub 2020 Dec 15.

Use of sequence analysis for classifying individual antidepressant trajectories to monitor population mental health.利用序列分析对个体抗抑郁轨迹进行分类，以监测人群心理健康。

BMC Psychiatry. 2020 Nov 23;20(1):551. doi: 10.1186/s12888-020-02952-y.

A national initiative in data science for health: an evaluation of the UK Farr Institute.一项针对健康领域的数据科学全国性倡议：英国法尔研究所评估

Int J Popul Data Sci. 2020 Apr 8;5(1):1128. doi: 10.23889/ijpds.v5i1.1128.

Scotland's 2009-2015 methadone-prescription cohort: Quintiles for daily dose of prescribed methadone and risk of methadone-specific death.苏格兰2009 - 2015年美沙酮处方队列研究：美沙酮每日处方剂量的五分位数与美沙酮特定死亡风险

Br J Clin Pharmacol. 2021 Feb;87(2):652-673. doi: 10.1111/bcp.14432. Epub 2020 Jul 6.

Drug Abuse Research Trend Investigation with Text Mining.药物滥用研究趋势调查与文本挖掘。

Comput Math Methods Med. 2020 Feb 1;2020:1030815. doi: 10.1155/2020/1030815. eCollection 2020.

Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study.运用自然语言处理技术审视全加拿大药物安全研究项目的媒体报道的接受度、内容及可读性：横断面观察性研究

JMIR Form Res. 2020 Jan 14;4(1):e13296. doi: 10.2196/13296.

本文引用的文献

Use of direct oral anticoagulants in patients with atrial fibrillation in Scotland: Applying a coherent framework to drug utilisation studies.苏格兰心房颤动患者直接口服抗凝剂的使用：将连贯框架应用于药物利用研究。

Pharmacoepidemiol Drug Saf. 2017 Nov;26(11):1378-1386. doi: 10.1002/pds.4272. Epub 2017 Jul 28.

Risk-factors for methadone-specific deaths in Scotland's methadone-prescription clients between 2009 and 2013.2009年至2013年间苏格兰美沙酮处方客户中特定美沙酮死亡的风险因素。

Drug Alcohol Depend. 2016 Oct 1;167:214-23. doi: 10.1016/j.drugalcdep.2016.08.627. Epub 2016 Aug 29.

Data Resource Profile: The Scottish National Prescribing Information System (PIS).数据资源简介：苏格兰国家处方信息系统（PIS）。

Int J Epidemiol. 2016 Jun;45(3):714-715f. doi: 10.1093/ije/dyw060. Epub 2016 May 10.

BMC Med Inform Decis Mak. 2016 Feb 9;16:18. doi: 10.1186/s12911-016-0255-x.

The intriguing future of pharmacoepidemiology.药物流行病学的迷人未来。

Eur J Clin Pharmacol. 2013 May;69 Suppl 1:43-51. doi: 10.1007/s00228-013-1496-6. Epub 2013 May 3.

Automatic extraction of medication information from medical discharge summaries.从医疗出院小结中自动提取药物信息。

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):545-8. doi: 10.1136/jamia.2010.003863.

Medication information extraction with linguistic pattern matching and semantic rules.基于语言模式匹配和语义规则的药物信息提取。

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):532-5. doi: 10.1136/jamia.2010.003657.

MedEx: a medication information extraction system for clinical narratives.MedEx：一个用于临床叙述的药物信息提取系统。

J Am Med Inform Assoc. 2010 Jan-Feb;17(1):19-24. doi: 10.1197/jamia.M3378.

An algorithm to derive a numerical daily dose from unstructured text dosage instructions.一种从非结构化文本剂量说明中得出每日数值剂量的算法。

Pharmacoepidemiol Drug Saf. 2006 Mar;15(3):161-6. doi: 10.1002/pds.1151.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验