Suppr超能文献

利用文本挖掘方法提高药物暴露计算效率以支持药物流行病学研究。

Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies.

机构信息

Public Health and Intelligence Strategic Business Unit, NHS National Services Scotland, Edinburgh, UK.

Farr Institute of Health Informatics Research, Edinburgh, UK.

出版信息

Int J Epidemiol. 2018 Apr 1;47(2):617-624. doi: 10.1093/ije/dyx264.

Abstract

BACKGROUND

Efficient generation of structured dose instructions that enable researchers to calculate drug exposure is central to pharmacoepidemiology studies. Our aim was to design and test an algorithm to codify dose instructions, applied to the NHS Scotland Prescribing Information System (PIS) that records about 100 million prescriptions per annum.

METHODS

A natural language processing (NLP) algorithm was developed that enabled free-text dose instructions to be represented by three attributes - quantity, frequency and qualifier - specified by three, three and two variables, respectively. A sample of 15 593 distinct dose instructions was used to test, validate and refine the algorithm. The final algorithm used a zero-assumption approach and was then applied to the full dataset.

RESULTS

The initial algorithm generated structured output for 13 152 (84.34%) of the 15 593 sample dose instructions, and reviewers identified 767 (5.83%) incorrect translations, giving an accuracy of 94.17%. Following subsequent refinement of the algorithm rules, application to the full dataset of 458 227 687 prescriptions (99.67% had dose instructions represented by 4 964 083 distinct instructions) generated a structured output for 92.3% of dose instruction texts. This varied by therapeutic area (from 86.7% for the central nervous system to 96.8% for the cardiovascular system).

CONCLUSIONS

We created an NLP algorithm, operational at scale, to produce structured output that gives data users maximum flexibility to formulate, test and apply their own assumptions according to the medicines under investigation. Text mining approaches can provide a solution to the safe and efficient management and provisioning of large volumes of data generated through our health systems.

摘要

背景

高效生成结构化剂量说明,使研究人员能够计算药物暴露量,这是药物流行病学研究的核心。我们的目的是设计和测试一种算法,对苏格兰国民保健系统处方信息系统(PIS)中记录的约 1 亿份处方进行编码,以记录剂量说明。

方法

开发了一种自然语言处理(NLP)算法,使自由文本剂量说明能够用三个属性来表示,分别是数量、频率和限定词,每个属性分别由三个、三个和两个变量来指定。该算法用 15593 个不同的剂量说明样本来测试、验证和改进。最终算法采用零假设方法,然后应用于整个数据集。

结果

初始算法生成了 13152 个(84.34%)样本剂量说明的结构化输出,审核人员确定了 767 个(5.83%)错误翻译,准确率为 94.17%。在随后对算法规则进行改进后,将其应用于包含 458227687 个处方的整个数据集(99.67%的处方有剂量说明,由 4964083 个不同的说明组成),生成了 92.3%的剂量说明文本的结构化输出。这因治疗领域而异(中枢神经系统为 86.7%,心血管系统为 96.8%)。

结论

我们创建了一种自然语言处理算法,可以大规模生成结构化输出,为数据用户提供最大的灵活性,根据所研究的药物来制定、测试和应用自己的假设。文本挖掘方法可以为安全有效地管理和提供我们医疗系统生成的大量数据提供解决方案。

相似文献

引用本文的文献

9
Drug Abuse Research Trend Investigation with Text Mining.药物滥用研究趋势调查与文本挖掘。
Comput Math Methods Med. 2020 Feb 1;2020:1030815. doi: 10.1155/2020/1030815. eCollection 2020.

本文引用的文献

5
The intriguing future of pharmacoepidemiology.药物流行病学的迷人未来。
Eur J Clin Pharmacol. 2013 May;69 Suppl 1:43-51. doi: 10.1007/s00228-013-1496-6. Epub 2013 May 3.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验