• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SURUS评估:一种用于从介入性研究记录中提取知识的命名实体识别自然语言处理系统。

Evaluation of SURUS: a named entity recognition NLP system to extract knowledge from interventional study records.

作者信息

Peeters Casper, Vijverberg Koen, Pouwer Marianne, Westerman Bart, Boot Maikel, Verberne Suzan

机构信息

Medstone Science, Amsterdam, The Netherlands.

Amsterdam University Medical Center (UMC), Amsterdam, The Netherlands.

出版信息

BMC Med Res Methodol. 2025 Jul 31;25(1):184. doi: 10.1186/s12874-025-02624-z.

DOI:10.1186/s12874-025-02624-z
PMID:40745274
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12315421/
Abstract

BACKGROUND

Medical decision-making commonly is guided by evidence-based analyses from systematic literature reviews (SLRs). These require large amounts of time and subject matter expertise to perform. Automated extraction of key datapoints from clinical publications could speed up the process of systematic literature review assembly. To this end, we built SURUS, a named entity recognition (NER) system comprised of a Bidirectional Encoder Representations from Transformers (BERT) model trained on a fine-grained dataset. The aim of this study was to assess the quality of SURUS classifications of PICO (patient, intervention, comparator and outcome) and study design elements of clinical study abstracts.

METHODS

The PubMedBERT-based model was trained and evaluated using a dataset of 39,531 labels amongst 400 clinical abstracts, with an inter-annotator agreement of 0.81 (Cohen’s κ) and 0.88 (F1). The labels were manually annotated using a strict annotation guide. We evaluated quality of the dataset and tested the utility of the model in the practise of systematic literature screening, by comparing SURUS predictions to expert PICO and design classifications. Additionally, we tested out-of-domain quality of the model across 7 other therapeutic areas and another study design.

RESULTS

The SURUS NER system achieved an overall F1 score of 0.95, with minor deviation between labels. In addition, SURUS achieved a NER F1 of 0.90 and 0.84 for out-of-domain therapeutic area and observational study abstracts, respectively. Finally, F1 of PICO and study design classifications was 0.89 with a recall of 0.96 compared to expert classifications.

CONCLUSION

The system reaches an F1 score of 0.95 across 25 contextually different medical named entities. This high-quality in-domain medical entity prediction of a fine-tuned BERT-based model was the result of a strict annotation guideline and high inter-annotator agreement. This prediction accuracy was largely preserved during extensive out-of-domain evaluation, indicating its utility across other indication areas and study types. Current approaches in the field lack in the fine-grained training data and versatility demonstrated here. We think that this approach sets a new standard in medical literature analysis and paves the way for creating fine-grained datasets of labelled entities that can be used for downstream analysis outside of traditional SLRs.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1186/s12874-025-02624-z.

摘要

背景

医学决策通常由系统文献综述(SLR)的循证分析来指导。这些分析需要大量时间和专业知识来进行。从临床出版物中自动提取关键数据点可以加快系统文献综述的汇编过程。为此,我们构建了SURUS,这是一个命名实体识别(NER)系统,由在细粒度数据集上训练的双向编码器表征来自变压器(BERT)模型组成。本研究的目的是评估SURUS对临床研究摘要的PICO(患者、干预措施、对照和结果)及研究设计要素分类的质量。

方法

基于PubMedBERT的模型使用400篇临床摘要中的39,531个标签数据集进行训练和评估,标注者间一致性为0.81(科恩κ系数)和0.88(F1值)。标签使用严格的标注指南进行手动标注。我们通过将SURUS预测结果与专家的PICO及设计分类进行比较,评估了数据集的质量并测试了该模型在系统文献筛选实践中的效用。此外,我们在其他7个治疗领域和另一种研究设计中测试了该模型的域外质量。

结果

SURUS NER系统的总体F1得分为0.95,标签间偏差较小。此外,SURUS对域外治疗领域和观察性研究摘要的NER F1分别为0.90和0.84。最后,与专家分类相比,PICO和研究设计分类的F1为0.89,召回率为0.96。

结论

该系统在25个上下文不同的医学命名实体上的F1得分为0.95。这种基于微调BERT模型的高质量域内医学实体预测是严格标注指南和高标注者间一致性的结果。在广泛的域外评估中,这种预测准确性在很大程度上得以保留,表明其在其他适应症领域和研究类型中的效用。该领域目前的方法缺乏此处展示的细粒度训练数据和通用性。我们认为这种方法为医学文献分析设定了新标准,并为创建可用于传统SLR之外的下游分析的带标签实体细粒度数据集铺平了道路。

补充信息

在线版本包含可在10.1186/s12874-025-02624-z获取的补充材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/852f/12315421/47af6f340482/12874_2025_2624_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/852f/12315421/818bfdebe552/12874_2025_2624_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/852f/12315421/278c7576b7b7/12874_2025_2624_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/852f/12315421/47af6f340482/12874_2025_2624_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/852f/12315421/818bfdebe552/12874_2025_2624_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/852f/12315421/278c7576b7b7/12874_2025_2624_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/852f/12315421/47af6f340482/12874_2025_2624_Fig3_HTML.jpg

相似文献

1
Evaluation of SURUS: a named entity recognition NLP system to extract knowledge from interventional study records.SURUS评估:一种用于从介入性研究记录中提取知识的命名实体识别自然语言处理系统。
BMC Med Res Methodol. 2025 Jul 31;25(1):184. doi: 10.1186/s12874-025-02624-z.
2
Development of a Natural Language Processing Model for Extracting Kidney Biopsy Pathology Diagnoses.用于提取肾活检病理诊断的自然语言处理模型的开发
Kidney Med. 2025 Jun 14;7(8):101047. doi: 10.1016/j.xkme.2025.101047. eCollection 2025 Aug.
3
Family History Extraction From Synthetic Clinical Narratives Using Natural Language Processing: Overview and Evaluation of a Challenge Data Set and Solutions for the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing (OHNLP) Competition.利用自然语言处理从合成临床叙述中提取家族病史:2019年国家自然语言处理临床挑战(n2c2)/开放健康自然语言处理(OHNLP)竞赛的挑战数据集概述与评估及解决方案
JMIR Med Inform. 2021 Jan 27;9(1):e24008. doi: 10.2196/24008.
4
Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks.基于深度学习的自然语言处理模型在系统文献综述任务的全文数据元素提取中的应用。
Sci Rep. 2025 Jun 3;15(1):19379. doi: 10.1038/s41598-025-03979-5.
5
Natural language processing in medical text processing: A scoping literature review.
Int J Med Inform. 2025 Dec;204:106049. doi: 10.1016/j.ijmedinf.2025.106049. Epub 2025 Jul 17.
6
From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.从BERT到生成式人工智能——在一组肺癌患者中比较仅编码器模型与大语言模型用于非结构化医疗报告中的命名实体识别
Comput Biol Med. 2025 Sep;195:110665. doi: 10.1016/j.compbiomed.2025.110665. Epub 2025 Jun 24.
7
Semi-supervised learning from small annotated data and large unlabeled data for fine-grained Participants, Intervention, Comparison, and Outcomes entity recognition.从小规模标注数据和大规模未标注数据中进行半监督学习,用于细粒度的参与者、干预措施、对照和结果实体识别。
J Am Med Inform Assoc. 2025 Mar 1;32(3):555-565. doi: 10.1093/jamia/ocae326.
8
Detecting Redundant Health Survey Questions by Using Language-Agnostic Bidirectional Encoder Representations From Transformers Sentence Embedding: Algorithm Development Study.使用来自Transformer句子嵌入的语言无关双向编码器表示法检测冗余健康调查问题:算法开发研究
JMIR Med Inform. 2025 Jun 10;13:e71687. doi: 10.2196/71687.
9
Knowledge Graph-Enhanced Deep Learning Model (H-SYSTEM) for Hypertensive Intracerebral Hemorrhage: Model Development and Validation.用于高血压性脑出血的知识图谱增强深度学习模型(H-SYSTEM):模型开发与验证
J Med Internet Res. 2025 Jun 12;27:e66055. doi: 10.2196/66055.
10
Transformers for extracting breast cancer information from Spanish clinical narratives.从西班牙语临床叙述中提取乳腺癌信息的转换器。
Artif Intell Med. 2023 Sep;143:102625. doi: 10.1016/j.artmed.2023.102625. Epub 2023 Jul 13.

本文引用的文献

1
Advancing entity recognition in biomedicine via instruction tuning of large language models.通过指令调整大型语言模型推进生物医学中的实体识别。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.
2
Beyond ChatGPT: What does GPT-4 add to healthcare? The dawn of a new era.超越 ChatGPT:GPT-4 为医疗保健带来了什么?新时代的曙光。
Cardiol J. 2023;30(6):1018-1025. doi: 10.5603/cj.97515. Epub 2023 Oct 13.
3
An extensive benchmark study on biomedical text generation and mining with ChatGPT.一项关于使用ChatGPT进行生物医学文本生成和挖掘的广泛基准研究。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad557.
4
ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations.医学领域的ChatGPT:其应用、优势、局限性、未来前景及伦理考量概述
Front Artif Intell. 2023 May 4;6:1169595. doi: 10.3389/frai.2023.1169595. eCollection 2023.
5
Data extraction methods for systematic review (semi)automation: Update of a living systematic review.系统评价(半)自动化的数据提取方法:一项实时系统评价的更新
F1000Res. 2021 May 19;10:401. doi: 10.12688/f1000research.51117.3. eCollection 2021.
6
: Mapping and Browsing Medical Evidence in Real-Time.实时映射与浏览医学证据
Proc Conf. 2020 Jul;2020:63-69. doi: 10.18653/v1/2020.acl-demos.9.
7
RandoMice, a novel, user-friendly randomization tool in animal research.RandoMice,一种新型的、用户友好的动物研究随机化工具。
PLoS One. 2020 Aug 5;15(8):e0237096. doi: 10.1371/journal.pone.0237096. eCollection 2020.
8
Advancing PICO element detection in biomedical text via deep neural networks.通过深度神经网络提高生物医学文本中的 PICO 元素检测。
Bioinformatics. 2020 Jun 1;36(12):3856-3862. doi: 10.1093/bioinformatics/btaa256.
9
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
10
The significant cost of systematic reviews and meta-analyses: A call for greater involvement of machine learning to assess the promise of clinical trials.系统评价和荟萃分析的高昂成本:呼吁机器学习更多地参与评估临床试验的前景。
Contemp Clin Trials Commun. 2019 Aug 25;16:100443. doi: 10.1016/j.conctc.2019.100443. eCollection 2019 Dec.