Suppr超能文献

在2019冠状病毒病(COVID-19)研究的动态地图中,使用机器学习的微软学术图谱进行自动研究识别的成本效益分析。

Cost-effectiveness of Microsoft Academic Graph with machine learning for automated study identification in a living map of coronavirus disease 2019 (COVID-19) research.

作者信息

Shemilt Ian, Arno Anneliese, Thomas James, Lorenc Theo, Khouja Claire, Raine Gary, Sutcliffe Katy, Preethy D'Souza, Kwan Irene, Wright Kath, Sowden Amanda

机构信息

EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK.

Centre for Reviews and Dissemination, University of York, UK, York, Yorkshire, UK.

出版信息

Wellcome Open Res. 2024 Mar 26;6:210. doi: 10.12688/wellcomeopenres.17141.2. eCollection 2021.

Abstract

BACKGROUND

Identifying new, eligible studies for integration into living systematic reviews and maps usually relies on conventional Boolean updating searches of multiple databases and manual processing of the updated results. Automated searches of one, comprehensive, continuously updated source, with adjunctive machine learning, could enable more efficient searching, selection and prioritisation workflows for updating (living) reviews and maps, though research is needed to establish this. Microsoft Academic Graph (MAG) is a potentially comprehensive single source which also contains metadata that can be used in machine learning to help efficiently identify eligible studies. This study sought to establish whether: (a) MAG was a sufficiently sensitive single source to maintain our living map of COVID-19 research; and (b) eligible records could be identified with an acceptably high level of specificity.

METHODS

We conducted an eight-arm cost-effectiveness analysis to assess the costs, recall and precision of semi-automated workflows, incorporating MAG with adjunctive machine learning, for continually updating our living map. Resource use data (time use) were collected from information specialists and other researchers involved in map production. Our systematic review software, EPPI-Reviewer, was adapted to incorporate MAG and associated machine learning workflows, and also used to collect data on recall, precision, and manual screening workload.

RESULTS

The semi-automated MAG-enabled workflow dominated conventional workflows in both the base case and sensitivity analyses. At one month our MAG-enabled workflow with machine learning, active learning and fixed screening targets identified 469 additional, eligible articles for inclusion in our living map, and cost £3,179 GBP per week less, compared with conventional methods relying on Boolean searches of Medline and Embase.

CONCLUSIONS

We were able to increase recall and coverage of a large living map, whilst reducing its production costs. This finding is likely to be transferrable to OpenAlex, MAG's successor database platform.

摘要

背景

识别可纳入动态系统评价和图谱的新的合格研究通常依赖于对多个数据库进行传统的布尔型更新检索以及对更新结果进行人工处理。对一个全面、持续更新的单一来源进行自动化检索,并辅以机器学习,可能会为更新(动态)评价和图谱带来更高效的检索、筛选和优先级排序工作流程,不过这需要开展研究来加以证实。微软学术图谱(MAG)是一个潜在的全面单一来源,它还包含可用于机器学习以帮助高效识别合格研究的元数据。本研究旨在确定:(a)MAG是否是一个足够敏感的单一来源,以维持我们的COVID-19研究动态图谱;以及(b)能否以可接受的高特异性识别合格记录。

方法

我们进行了一项八臂成本效益分析,以评估将MAG与辅助机器学习相结合的半自动工作流程在持续更新我们的动态图谱方面的成本、召回率和精确率。从参与图谱制作的信息专家和其他研究人员处收集资源使用数据(时间使用情况)。我们的系统评价软件EPPI-Reviewer进行了调整,以纳入MAG和相关的机器学习工作流程,还用于收集关于召回率、精确率和人工筛选工作量的数据。

结果

在基础案例分析和敏感性分析中,启用MAG的半自动工作流程均优于传统工作流程。在一个月时,我们启用MAG并结合机器学习、主动学习和固定筛选目标的工作流程识别出469篇额外的合格文章纳入我们的动态图谱,与依赖对Medline和Embase进行布尔型检索的传统方法相比,每周成本少3179英镑。

结论

我们能够提高大型动态图谱的召回率和覆盖范围,同时降低其制作成本。这一发现可能适用于MAG的后续数据库平台OpenAlex。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6b6/11056682/b86b8ff7a5bf/wellcomeopenres-6-20586-g0000.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验