Risklick AG, Spin-Off, University of Bern, Bern, Switzerland.
CTU Bern, University of Bern, Bern, Switzerland.
Syst Rev. 2022 Aug 17;11(1):172. doi: 10.1186/s13643-022-02045-9.
Identifying and removing reference duplicates when conducting systematic reviews (SRs) remain a major, time-consuming issue for authors who manually check for duplicates using built-in features in citation managers. To address issues related to manual deduplication, we developed an automated, efficient, and rapid artificial intelligence-based algorithm named Deduklick. Deduklick combines natural language processing algorithms with a set of rules created by expert information specialists.
Deduklick's deduplication uses a multistep algorithm of data normalization, calculates a similarity score, and identifies unique and duplicate references based on metadata fields, such as title, authors, journal, DOI, year, issue, volume, and page number range. We measured and compared Deduklick's capacity to accurately detect duplicates with the information specialists' standard, manual duplicate removal process using EndNote on eight existing heterogeneous datasets. Using a sensitivity analysis, we manually cross-compared the efficiency and noise of both methods.
Deduklick achieved average recall of 99.51%, average precision of 100.00%, and average F1 score of 99.75%. In contrast, the manual deduplication process achieved average recall of 88.65%, average precision of 99.95%, and average F1 score of 91.98%. Deduklick achieved equal to higher expert-level performance on duplicate removal. It also preserved high metadata quality and drastically reduced time spent on analysis. Deduklick represents an efficient, transparent, ergonomic, and time-saving solution for identifying and removing duplicates in SRs searches. Deduklick could therefore simplify SRs production and represent important advantages for scientists, including saving time, increasing accuracy, reducing costs, and contributing to quality SRs.
在进行系统评价(SR)时,识别和去除参考文献重复仍然是作者手动使用引文管理器内置功能检查重复的主要耗时问题。为了解决与手动去重相关的问题,我们开发了一种名为 Deduklick 的自动化、高效、快速的基于人工智能的算法。Deduklick 将自然语言处理算法与一组由专家信息专家创建的规则相结合。
Deduklick 的去重使用数据归一化的多步算法,计算相似度得分,并根据元数据字段(如标题、作者、期刊、DOI、年份、问题、卷和页码范围)识别唯一和重复的参考文献。我们使用 EndNote 在八个现有的异构数据集上测量和比较了 Deduklick 准确检测重复的能力与信息专家的标准、手动重复去除过程。使用敏感性分析,我们手动交叉比较了两种方法的效率和噪声。
Deduklick 的平均召回率为 99.51%,平均精度为 100.00%,平均 F1 分数为 99.75%。相比之下,手动去重过程的平均召回率为 88.65%,平均精度为 99.95%,平均 F1 分数为 91.98%。Deduklick 在去除重复方面达到了与专家水平相当甚至更高的性能。它还保持了较高的元数据质量,并大大减少了分析所花费的时间。Deduklick 为识别和去除 SR 搜索中的重复提供了一种高效、透明、符合人体工程学且节省时间的解决方案。因此,Deduklick 可以简化 SR 的制作,并为科学家们带来重要的优势,包括节省时间、提高准确性、降低成本和有助于制作高质量的 SR。