BioSift：用于药物再利用和临床荟萃分析的生物医学摘要筛选数据集。

BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis.

作者信息

Kartchner David, Al-Hussaini Irfan, Turner Haydn, Deng Jennifer, Lohiya Shubham, Bathala Prasanth, Mitchell Cassie

机构信息

Georgia Institute of Technology, Atlanta, Georgia, USA.

出版信息

Int ACM SIGIR Conf Res Dev Inf Retr. 2023 Jul;2023:2913-2923. doi: 10.1145/3539618.3591897. Epub 2023 Jul 18.

DOI:10.1145/3539618.3591897

PMID:38690157

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11060830/

Abstract

This work presents a new, original document classification dataset, BioSift, to expedite the initial selection and labeling of studies for drug repurposing. The dataset consists of 10,000 human-annotated abstracts from scientific articles in PubMed. Each abstract is labeled with up to eight attributes necessary to perform meta-analysis utilizing the popular patient-intervention-comparator-outcome (PICO) method: has human subjects, is clinical trial/cohort, has population size, has target disease, has study drug, has comparator group, has a quantitative outcome, and an "aggregate" label. Each abstract was annotated by 3 different annotators (i.e., biomedical students) and randomly sampled abstracts were reviewed by senior annotators to ensure quality. Data statistics such as reviewer agreement, label co-occurrence, and confidence are shown. Robust benchmark results illustrate neither PubMed advanced filters nor state-of-the-art document classification schemes (e.g., active learning, weak supervision, full supervision) can efficiently replace human annotation. In short, BioSift is a pivotal but challenging document classification task to expedite drug repurposing. The full annotated dataset is publicly available and enables research development of algorithms for document classification that enhance drug repurposing.

摘要

这项工作提出了一个全新的、原创的文档分类数据集BioSift，以加快药物重新利用研究的初步筛选和标注。该数据集由来自PubMed科学文章的10000篇人工标注摘要组成。每个摘要都用利用流行的患者-干预-对照-结果（PICO）方法进行荟萃分析所需的多达八个属性进行标注：有人类受试者、是临床试验/队列研究、有样本量、有目标疾病、有研究药物、有对照组、有定量结果以及一个“汇总”标签。每个摘要由3名不同的标注员（即生物医学专业学生）进行标注，随机抽取的摘要由资深标注员进行审核以确保质量。展示了诸如审核员一致性、标签共现性和置信度等数据统计信息。稳健的基准测试结果表明，无论是PubMed高级筛选器还是最先进的文档分类方案（如主动学习、弱监督、全监督）都无法有效替代人工标注。简而言之，BioSift是加快药物重新利用的一项关键但具有挑战性的文档分类任务。完整的标注数据集可公开获取，并能推动用于增强药物重新利用的文档分类算法的研究发展。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

BioSift：用于药物再利用和临床荟萃分析的生物医学摘要筛选数据集。

BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

BioSift：用于药物再利用和临床荟萃分析的生物医学摘要筛选数据集。

BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献