Suppr超能文献

生物医学文献的半自动系统评价筛选。

Semi-automated screening of biomedical citations for systematic reviews.

机构信息

Department of Computer Science, Tufts University, Medford, MA, USA.

出版信息

BMC Bioinformatics. 2010 Jan 26;11:55. doi: 10.1186/1471-2105-11-55.

Abstract

BACKGROUND

Systematic reviews address a specific clinical question by unbiasedly assessing and analyzing the pertinent literature. Citation screening is a time-consuming and critical step in systematic reviews. Typically, reviewers must evaluate thousands of citations to identify articles eligible for a given review. We explore the application of machine learning techniques to semi-automate citation screening, thereby reducing the reviewers' workload.

RESULTS

We present a novel online classification strategy for citation screening to automatically discriminate "relevant" from "irrelevant" citations. We use an ensemble of Support Vector Machines (SVMs) built over different feature-spaces (e.g., abstract and title text), and trained interactively by the reviewer(s). Semi-automating the citation screening process is difficult because any such strategy must identify all citations eligible for the systematic review. This requirement is made harder still due to class imbalance; there are far fewer "relevant" than "irrelevant" citations for any given systematic review. To address these challenges we employ a custom active-learning strategy developed specifically for imbalanced datasets. Further, we introduce a novel undersampling technique. We provide experimental results over three real-world systematic review datasets, and demonstrate that our algorithm is able to reduce the number of citations that must be screened manually by nearly half in two of these, and by around 40% in the third, without excluding any of the citations eligible for the systematic review.

CONCLUSIONS

We have developed a semi-automated citation screening algorithm for systematic reviews that has the potential to substantially reduce the number of citations reviewers have to manually screen, without compromising the quality and comprehensiveness of the review.

摘要

背景

系统评价通过公正地评估和分析相关文献来解决特定的临床问题。引文筛选是系统评价中耗时且关键的步骤。通常,评审员必须评估数千条引文,以确定符合特定综述的文章。我们探讨了应用机器学习技术来半自动筛选引文,从而减轻评审员的工作量。

结果

我们提出了一种新颖的在线引文筛选分类策略,以自动区分“相关”和“不相关”的引文。我们使用基于不同特征空间(例如摘要和标题文本)的集成支持向量机(SVM),并由评审员进行交互式训练。半自动引文筛选过程很困难,因为任何这样的策略都必须识别出所有符合系统评价的引文。由于类不平衡,这一要求更加困难;对于任何给定的系统评价,“相关”引文比“不相关”引文少得多。为了应对这些挑战,我们采用了专门为不平衡数据集开发的自定义主动学习策略。此外,我们引入了一种新颖的欠采样技术。我们在三个真实的系统评价数据集上提供了实验结果,结果表明,在其中两个数据集上,我们的算法能够将必须手动筛选的引文数量减少近一半,而在第三个数据集上,减少约 40%,而不会排除任何符合系统评价的引文。

结论

我们开发了一种用于系统评价的半自动引文筛选算法,它有可能在不影响评价质量和全面性的情况下,大大减少评审员必须手动筛选的引文数量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a61/2824679/e0a3bea3d709/1471-2105-11-55-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验