Suppr超能文献

为药物警戒智能自动化开发众包训练数据集。

Developing Crowdsourced Training Data Sets for Pharmacovigilance Intelligent Automation.

作者信息

Gartland Alex, Bate Andrew, Painter Jeffery L, Casperson Tim A, Powell Gregory Eugene

机构信息

College of Medicine, University of Central Florida, Orlando, FL, USA.

Safety and Medical Governance, GlaxoSmithKline, London, UK.

出版信息

Drug Saf. 2021 Mar;44(3):373-382. doi: 10.1007/s40264-020-01028-w. Epub 2020 Dec 22.

Abstract

INTRODUCTION

Machine learning offers an alluring solution to developing automated approaches to the increasing individual case safety report burden being placed upon pharmacovigilance. Leveraging crowdsourcing to annotate unstructured data may provide accurate, efficient, and contemporaneous training data sets in support of machine learning.

OBJECTIVE

The objective of this study was to evaluate whether crowdsourcing can be used to accurately and efficiently develop training data sets in support of pharmacovigilance automation.

MATERIALS AND METHODS

Pharmacovigilance experts created a reference dataset by reviewing 15,490 de-identified social media posts of narratives pertaining to 15 drugs and 22 medically relevant topics. A random sampling of posts from the reference dataset was published on Amazon Turk and its users (Turkers) were asked a series of questions about those same medical concepts. Accuracy, price elasticity, and time efficiency were evaluated.

RESULTS

Accuracy of crowdsourced curation exceeded 90% when compared to the reference dataset and was completed in about 5% of the time. There was an increase in time efficiency with higher pay, but there was no significant difference in accuracy. Additionally, having a social media post reviewed by more than one Turker (using a voting system) did not offer significant improvements in terms of accuracy.

CONCLUSIONS

Crowdsourcing is an accurate and efficient method that can be used to develop training data sets in support of pharmacovigilance automation. More research is needed to better understand the breadth and depth of possible uses as well as strengths, limitations, and generalizability of results.

摘要

引言

机器学习为开发自动化方法以应对药物警戒中日益增加的个体病例安全报告负担提供了一个诱人的解决方案。利用众包来注释非结构化数据可以提供准确、高效和及时的训练数据集,以支持机器学习。

目的

本研究的目的是评估众包是否可用于准确、高效地开发训练数据集以支持药物警戒自动化。

材料与方法

药物警戒专家通过审查15490条与15种药物和22个医学相关主题有关的去识别化社交媒体帖子,创建了一个参考数据集。从参考数据集中随机抽取的帖子发布在亚马逊土耳其机器人平台上,并询问其用户(土耳其机器人用户)一系列关于相同医学概念的问题。评估了准确性、价格弹性和时间效率。

结果

与参考数据集相比,众包管理的准确性超过90%,且完成时间约为原来的5%。报酬越高,时间效率越高,但准确性没有显著差异。此外,让多个土耳其机器人用户(使用投票系统)审查一条社交媒体帖子,在准确性方面并没有显著提高。

结论

众包是一种准确、高效的方法,可用于开发训练数据集以支持药物警戒自动化。需要更多的研究来更好地理解可能用途的广度和深度,以及结果的优势、局限性和普遍性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验