Suppr超能文献

用于系统评价的重复记录检测自动化:Deduplicator。

Automation of duplicate record detection for systematic reviews: Deduplicator.

机构信息

Institute for Evidence-Based Healthcare, Bond University, Gold Coast, Australia.

出版信息

Syst Rev. 2024 Aug 2;13(1):206. doi: 10.1186/s13643-024-02619-9.

Abstract

BACKGROUND

To describe the algorithm and investigate the efficacy of a novel systematic review automation tool "the Deduplicator" to remove duplicate records from a multi-database systematic review search.

METHODS

We constructed and tested the efficacy of the Deduplicator tool by using 10 previous Cochrane systematic review search results to compare the Deduplicator's 'balanced' algorithm to a semi-manual EndNote method. Two researchers each performed deduplication on the 10 libraries of search results. For five of those libraries, one researcher used the Deduplicator, while the other performed semi-manual deduplication with EndNote. They then switched methods for the remaining five libraries. In addition to this analysis, comparison between the three different Deduplicator algorithms ('balanced', 'focused' and 'relaxed') was performed on two datasets of previously deduplicated search results.

RESULTS

Before deduplication, the mean library size for the 10 systematic reviews was 1962 records. When using the Deduplicator, the mean time to deduplicate was 5 min per 1000 records compared to 15 min with EndNote. The mean error rate with Deduplicator was 1.8 errors per 1000 records in comparison to 3.1 with EndNote. Evaluation of the different Deduplicator algorithms found that the 'balanced' algorithm had the highest mean F1 score of 0.9647. The 'focused' algorithm had the highest mean accuracy of 0.9798 and the highest recall of 0.9757. The 'relaxed' algorithm had the highest mean precision of 0.9896.

CONCLUSIONS

This demonstrates that using the Deduplicator for duplicate record detection reduces the time taken to deduplicate, while maintaining or improving accuracy compared to using a semi-manual EndNote method. However, further research should be performed comparing more deduplication methods to establish relative performance of the Deduplicator against other deduplication methods.

摘要

背景

描述一种新的系统综述自动化工具“去重器”的算法,并研究其从多数据库系统综述检索中去除重复记录的效果。

方法

我们构建并测试了去重器工具的功效,使用 10 项先前的 Cochrane 系统综述检索结果来比较去重器的“平衡”算法与半手动 EndNote 方法。两位研究人员分别对 10 个检索结果库进行去重。对于其中 5 个库,一位研究人员使用去重器,另一位使用 EndNote 进行半手动去重。然后,他们切换方法处理其余 5 个库。除了此分析之外,还在两个先前去重的检索结果数据集上比较了三种不同的去重器算法(“平衡”、“聚焦”和“宽松”)。

结果

在去重之前,这 10 项系统综述的平均库大小为 1962 条记录。使用去重器时,每 1000 条记录去重的平均时间为 5 分钟,而使用 EndNote 的时间为 15 分钟。使用去重器的平均错误率为每 1000 条记录 1.8 个错误,而使用 EndNote 的错误率为 3.1 个错误。评估不同的去重器算法发现,“平衡”算法的平均 F1 得分为 0.9647,最高。“聚焦”算法的准确率最高,为 0.9798,召回率最高,为 0.9757。“宽松”算法的精度最高,为 0.9896。

结论

这表明,使用去重器进行重复记录检测可以减少去重所需的时间,同时与使用半手动 EndNote 方法相比,保持或提高准确性。然而,应进行进一步的研究,比较更多的去重方法,以确定去重器相对于其他去重方法的相对性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f391/11295717/289a8734f7ef/13643_2024_2619_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验