评价一个机器学习原型工具，以半自动提取系统文献综述的数据。

Evaluation of a prototype machine learning tool to semi-automate data extraction for systematic literature reviews.

机构信息

Takeda Pharmaceuticals International AG, Thurgauerstrasse 130, 8152, Glattpark-Opfikon, Zurich, Switzerland.

Oxford PharmaGenesis, Oxford, UK.

出版信息

Syst Rev. 2023 Oct 6;12(1):187. doi: 10.1186/s13643-023-02351-w.

DOI:10.1186/s13643-023-02351-w

PMID:37803451

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10557215/

Abstract

BACKGROUND

Evidence-based medicine requires synthesis of research through rigorous and time-intensive systematic literature reviews (SLRs), with significant resource expenditure for data extraction from scientific publications. Machine learning may enable the timely completion of SLRs and reduce errors by automating data identification and extraction.

METHODS

We evaluated the use of machine learning to extract data from publications related to SLRs in oncology (SLR 1) and Fabry disease (SLR 2). SLR 1 predominantly contained interventional studies and SLR 2 observational studies. Predefined key terms and data were manually annotated to train and test bidirectional encoder representations from transformers (BERT) and bidirectional long-short-term memory machine learning models. Using human annotation as a reference, we assessed the ability of the models to identify biomedical terms of interest (entities) and their relations. We also pretrained BERT on a corpus of 100,000 open access clinical publications and/or enhanced context-dependent entity classification with a conditional random field (CRF) model. Performance was measured using the F score, a metric that combines precision and recall. We defined successful matches as partial overlap of entities of the same type.

RESULTS

For entity recognition, the pretrained BERT+CRF model had the best performance, with an F score of 73% in SLR 1 and 70% in SLR 2. Entity types identified with the highest accuracy were metrics for progression-free survival (SLR 1, F score 88%) or for patient age (SLR 2, F score 82%). Treatment arm dosage was identified less successfully (F scores 60% [SLR 1] and 49% [SLR 2]). The best-performing model for relation extraction, pretrained BERT relation classification, exhibited F scores higher than 90% in cases with at least 80 relation examples for a pair of related entity types.

CONCLUSIONS

The performance of BERT is enhanced by pretraining with biomedical literature and by combining with a CRF model. With refinement, machine learning may assist with manual data extraction for SLRs.

摘要

背景

循证医学需要通过严格且耗时的系统文献综述（SLR）来综合研究，这需要大量资源来从科学出版物中提取数据。机器学习可以通过自动化数据识别和提取来实现 SLR 的及时完成并减少错误。

方法

我们评估了使用机器学习从肿瘤学（SLR1）和法布里病（SLR2）的 SLR 相关出版物中提取数据的方法。SLR1 主要包含干预性研究，而 SLR2 则为观察性研究。我们手动注释了预定义的关键术语和数据，以训练和测试双向编码器表示转换器（BERT）和双向长短时记忆机器学习模型。使用人工注释作为参考，我们评估了模型识别感兴趣的生物医学术语（实体）及其关系的能力。我们还在 100,000 篇开放获取临床出版物语料库上对 BERT 进行了预训练，或使用条件随机场（CRF）模型增强上下文相关实体分类。使用 F 分数（一种结合精度和召回率的指标）来衡量性能。我们将成功匹配定义为同一类型实体的部分重叠。

结果

对于实体识别，经过预训练的 BERT+CRF 模型表现最佳，在 SLR1 中的 F 分数为 73%，在 SLR2 中的 F 分数为 70%。识别准确率最高的实体类型是无进展生存期的度量指标（SLR1，F 分数 88%）或患者年龄（SLR2，F 分数 82%）。治疗组剂量的识别成功率较低（F 分数分别为 60%[SLR1]和 49%[SLR2]）。对于关系提取，表现最佳的模型是经过预训练的 BERT 关系分类，在一对相关实体类型中至少有 80 个关系示例的情况下，其 F 分数高于 90%。