• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

技术辅助的系统评价标题和摘要筛选:Abstrackr 机器学习工具的回顾性评估。

Technology-assisted title and abstract screening for systematic reviews: a retrospective evaluation of the Abstrackr machine learning tool.

机构信息

Alberta Research Centre for Health Evidence, Department of Pediatrics, University of Alberta, 11405-87 Avenue NW, Edmonton, Alberta, T6G 1C9, Canada.

出版信息

Syst Rev. 2018 Mar 12;7(1):45. doi: 10.1186/s13643-018-0707-8.

DOI:10.1186/s13643-018-0707-8
PMID:29530097
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5848519/
Abstract

BACKGROUND

Machine learning tools can expedite systematic review (SR) processes by semi-automating citation screening. Abstrackr semi-automates citation screening by predicting relevant records. We evaluated its performance for four screening projects.

METHODS

We used a convenience sample of screening projects completed at the Alberta Research Centre for Health Evidence, Edmonton, Canada: three SRs and one descriptive analysis for which we had used SR screening methods. The projects were heterogeneous with respect to search yield (median 9328; range 5243 to 47,385 records; interquartile range (IQR) 15,688 records), topic (Antipsychotics, Bronchiolitis, Diabetes, Child Health SRs), and screening complexity. We uploaded the records to Abstrackr and screened until it made predictions about the relevance of the remaining records. Across three trials for each project, we compared the predictions to human reviewer decisions and calculated the sensitivity, specificity, precision, false negative rate, proportion missed, and workload savings.

RESULTS

Abstrackr's sensitivity was > 0.75 for all projects and the mean specificity ranged from 0.69 to 0.90 with the exception of Child Health SRs, for which it was 0.19. The precision (proportion of records correctly predicted as relevant) varied by screening task (median 26.6%; range 14.8 to 64.7%; IQR 29.7%). The median false negative rate (proportion of records incorrectly predicted as irrelevant) was 12.6% (range 3.5 to 21.2%; IQR 12.3%). The workload savings were often large (median 67.2%, range 9.5 to 88.4%; IQR 23.9%). The proportion missed (proportion of records predicted as irrelevant that were included in the final report, out of the total number predicted as irrelevant) was 0.1% for all SRs and 6.4% for the descriptive analysis. This equated to 4.2% (range 0 to 12.2%; IQR 7.8%) of the records in the final reports.

CONCLUSIONS

Abstrackr's reliability and the workload savings varied by screening task. Workload savings came at the expense of potentially missing relevant records. How this might affect the results and conclusions of SRs needs to be evaluated. Studies evaluating Abstrackr as the second reviewer in a pair would be of interest to determine if concerns for reliability would diminish. Further evaluations of Abstrackr's performance and usability will inform its refinement and practical utility.

摘要

背景

机器学习工具可以通过半自动引用筛选来加速系统评价 (SR) 流程。Abstrackr 通过预测相关记录来半自动筛选引用。我们评估了它在四个筛选项目中的性能。

方法

我们使用了加拿大埃德蒙顿艾伯塔省健康证据研究中心完成的筛选项目的便利样本:三个 SR 和一个描述性分析,我们使用了 SR 筛选方法。这些项目在搜索结果(中位数 9328;范围 5243 至 47385 条记录;四分位距 (IQR) 15688 条记录)、主题(抗精神病药、细支气管炎、糖尿病、儿童健康 SR)和筛选复杂性方面存在异质性。我们将记录上传到 Abstrackr 并进行筛选,直到它对剩余记录的相关性做出预测。在每个项目的三个试验中,我们将预测与人类审查员的决策进行比较,并计算敏感性、特异性、精度、假阴性率、遗漏率和工作量节省。

结果

Abstrackr 的敏感性对于所有项目均>0.75,平均特异性范围为 0.69 至 0.90,儿童健康 SR 除外,其特异性为 0.19。精度(正确预测为相关的记录比例)因筛选任务而异(中位数 26.6%;范围 14.8 至 64.7%;IQR 29.7%)。假阴性率(错误预测为不相关的记录比例)中位数为 12.6%(范围 3.5 至 21.2%;IQR 12.3%)。工作量节省通常很大(中位数 67.2%;范围 9.5 至 88.4%;IQR 23.9%)。遗漏率(预测为不相关但包含在最终报告中的记录比例,占预测为不相关的总记录数)对于所有 SR 为 0.1%,对于描述性分析为 6.4%。这相当于最终报告中记录的 4.2%(范围 0 至 12.2%;IQR 7.8%)。

结论

Abstrackr 的可靠性和工作量节省因筛选任务而异。工作量节省是以潜在遗漏相关记录为代价的。这可能会如何影响 SR 的结果和结论需要进行评估。评估 Abstrackr 作为双人审查员中的第二位的研究将有助于确定对可靠性的担忧是否会减少。对 Abstrackr 性能和可用性的进一步评估将为其改进和实际应用提供信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/865b/5848519/3c4d787dd4f7/13643_2018_707_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/865b/5848519/3c4d787dd4f7/13643_2018_707_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/865b/5848519/3c4d787dd4f7/13643_2018_707_Fig1_HTML.jpg

相似文献

1
Technology-assisted title and abstract screening for systematic reviews: a retrospective evaluation of the Abstrackr machine learning tool.技术辅助的系统评价标题和摘要筛选:Abstrackr 机器学习工具的回顾性评估。
Syst Rev. 2018 Mar 12;7(1):45. doi: 10.1186/s13643-018-0707-8.
2
Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools.机器学习在系统评价筛选中的性能和可用性:三种工具的比较评估。
Syst Rev. 2019 Nov 15;8(1):278. doi: 10.1186/s13643-019-1222-2.
3
The semi-automation of title and abstract screening: a retrospective exploration of ways to leverage Abstrackr's relevance predictions in systematic and rapid reviews.标题和摘要筛选的半自动化:一种利用 Abstrackr 的相关性预测进行系统和快速综述的回溯性探索方法。
BMC Med Res Methodol. 2020 Jun 3;20(1):139. doi: 10.1186/s12874-020-01031-w.
4
Decoding semi-automated title-abstract screening: findings from a convenience sample of reviews.解码半自动标题-摘要筛选:来自便利样本综述的研究结果。
Syst Rev. 2020 Nov 27;9(1):272. doi: 10.1186/s13643-020-01528-x.
5
Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers.更快的标题和摘要筛选?评估Abstrackr,一款用于系统评价者的半自动在线筛选程序。
Syst Rev. 2015 Jun 15;4:80. doi: 10.1186/s13643-015-0067-6.
6
A text-mining tool generated title-abstract screening workload savings: performance evaluation versus single-human screening.文本挖掘工具可节省标题-摘要筛选工作量:性能评估与单人筛选的比较。
J Clin Epidemiol. 2022 Sep;149:53-59. doi: 10.1016/j.jclinepi.2022.05.017. Epub 2022 May 30.
7
8
Machine learning for screening prioritization in systematic reviews: comparative performance of Abstrackr and EPPI-Reviewer.机器学习在系统评价中的筛选优先级:Abstrackr 和 EPPI-Reviewer 的比较性能。
Syst Rev. 2020 Apr 2;9(1):73. doi: 10.1186/s13643-020-01324-7.
9
Evaluation of text mining to reduce screening workload for injury-focused systematic reviews.文本挖掘在以损伤为重点的系统评价中的应用评价,以减少筛选工作量。
Inj Prev. 2020 Feb;26(1):55-60. doi: 10.1136/injuryprev-2019-043247. Epub 2019 Aug 26.
10
Usefulness of machine learning softwares to screen titles of systematic reviews: a methodological study.机器学习软件在系统评价标题筛选中的应用:一项方法学研究。
Syst Rev. 2023 Apr 15;12(1):68. doi: 10.1186/s13643-023-02231-3.

引用本文的文献

1
Artificial Intelligence and Automation in Evidence Synthesis: An Investigation of Methods Employed in Cochrane, Campbell Collaboration, and Environmental Evidence Reviews.循证综合中的人工智能与自动化:对Cochrane、坎贝尔协作组织及环境证据综述所采用方法的调查
Cochrane Evid Synth Methods. 2025 Aug 28;3(5):e70046. doi: 10.1002/cesm.70046. eCollection 2025 Sep.
2
A comparative study of screening performance between abstrackr and GPT models: Systematic review and contextual analysis.Abstrackr与GPT模型筛查性能的比较研究:系统评价与情境分析。
BMC Med Inform Decis Mak. 2025 Aug 7;25(1):293. doi: 10.1186/s12911-025-03138-w.
3

本文引用的文献

1
Living systematic reviews: 2. Combining human and machine effort.实时系统评价:2. 整合人工与机器的力量。
J Clin Epidemiol. 2017 Nov;91:31-37. doi: 10.1016/j.jclinepi.2017.08.011. Epub 2017 Sep 11.
2
Living systematic review: 1. Introduction-the why, what, when, and how.系统综述的应用:1. 引言——为何、何事、何时、如何。
J Clin Epidemiol. 2017 Nov;91:23-30. doi: 10.1016/j.jclinepi.2017.08.010. Epub 2017 Sep 11.
3
Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry.
Validation of automated paper screening for esophagectomy systematic review using large language models.
使用大语言模型对食管癌切除术系统评价的自动化文献筛选进行验证。
PeerJ Comput Sci. 2025 Apr 30;11:e2822. doi: 10.7717/peerj-cs.2822. eCollection 2025.
4
Machine learning for accelerating screening in evidence reviews.用于加速循证综述筛查的机器学习。
Cochrane Evid Synth Methods. 2023 Jul 20;1(5):e12021. doi: 10.1002/cesm.12021. eCollection 2023 Jul.
5
Digital Tools to Support the Systematic Review Process: An Introduction.支持系统评价过程的数字工具:简介
J Eval Clin Pract. 2025 Apr;31(3):e70100. doi: 10.1111/jep.70100.
6
New opportunities and challenges for conservation evidence synthesis from advances in natural language processing.自然语言处理进展给保护证据综合带来的新机遇与挑战。
Conserv Biol. 2025 Apr;39(2):e14464. doi: 10.1111/cobi.14464.
7
Accuracy of Large Language Models for Literature Screening in Thoracic Surgery: Diagnostic Study.大型语言模型在胸外科文献筛选中的准确性:诊断性研究
J Med Internet Res. 2025 Mar 11;27:e67488. doi: 10.2196/67488.
8
The role of olive oil and its constituents in mental health: a scoping review.橄榄油及其成分在心理健康中的作用:一项范围综述。
Br J Nutr. 2024 Dec 19:1-11. doi: 10.1017/S000711452400299X.
9
Development and Validation of a Literature Screening Tool: Few-Shot Learning Approach in Systematic Reviews.文献筛选工具的开发与验证:系统评价中的少样本学习方法
J Med Internet Res. 2024 Dec 11;26:e56863. doi: 10.2196/56863.
10
Semi-automated title-abstract screening using natural language processing and machine learning.使用自然语言处理和机器学习进行半自动标题-摘要筛选。
Syst Rev. 2024 Nov 1;13(1):274. doi: 10.1186/s13643-024-02688-w.
利用PROSPERO注册库的数据,分析对医学干预措施进行系统评价所需的时间和人员。
BMJ Open. 2017 Feb 27;7(2):e012545. doi: 10.1136/bmjopen-2016-012545.
4
Epidemiology and Reporting Characteristics of Systematic Reviews of Biomedical Research: A Cross-Sectional Study.生物医学研究系统评价的流行病学及报告特征:一项横断面研究
PLoS Med. 2016 May 24;13(5):e1002028. doi: 10.1371/journal.pmed.1002028. eCollection 2016 May.
5
The art and science of study identification: a comparative analysis of two systematic reviews.研究识别的艺术与科学:两项系统评价的比较分析
BMC Med Res Methodol. 2016 Feb 24;16:24. doi: 10.1186/s12874-016-0118-2.
6
Wasted research when systematic reviews fail to provide a complete and up-to-date evidence synthesis: the example of lung cancer.当系统评价未能提供完整且最新的证据综合时,研究就被浪费了:以肺癌为例。
BMC Med. 2016 Jan 20;14:8. doi: 10.1186/s12916-016-0555-0.
7
Behavioral Programs for Type 2 Diabetes Mellitus: A Systematic Review and Network Meta-analysis.行为干预方案对 2 型糖尿病的作用:系统评价和网络荟萃分析。
Ann Intern Med. 2015 Dec 1;163(11):848-60. doi: 10.7326/M15-1400. Epub 2015 Sep 29.
8
Behavioral Programs for Type 1 Diabetes Mellitus: A Systematic Review and Meta-analysis.1 型糖尿病行为干预方案:系统评价和荟萃分析。
Ann Intern Med. 2015 Dec 1;163(11):836-47. doi: 10.7326/M15-1399. Epub 2015 Sep 29.
9
Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers.更快的标题和摘要筛选?评估Abstrackr,一款用于系统评价者的半自动在线筛选程序。
Syst Rev. 2015 Jun 15;4:80. doi: 10.1186/s13643-015-0067-6.
10
Precision of healthcare systematic review searches in a cross-sectional sample.医疗保健系统评价检索的精准度:横断面样本分析。
Res Synth Methods. 2011 Jun;2(2):119-25. doi: 10.1002/jrsm.42. Epub 2011 Sep 27.