• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用框架语义增强来自变换器的双向编码器表征(BERT)以从德国乳腺钼靶报告中提取临床相关信息:算法开发与验证

Enhancing Bidirectional Encoder Representations From Transformers (BERT) With Frame Semantics to Extract Clinically Relevant Information From German Mammography Reports: Algorithm Development and Validation.

作者信息

Reichenpfader Daniel, Knupp Jonas, von Däniken Sandro Urs, Gaio Roberto, Dennstädt Fabio, Cereghetti Grazia Maria, Sander André, Hiltbrunner Hans, Nairz Knud, Denecke Kerstin

机构信息

Institute for Patient-Centered Digital Health, School of Engineering and Computer Science, Bern University of Applied Sciences, Biel/Bienne, Switzerland.

PhD School of Life Sciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland.

出版信息

J Med Internet Res. 2025 Apr 25;27:e68427. doi: 10.2196/68427.

DOI:10.2196/68427
PMID:40279645
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12064967/
Abstract

BACKGROUND

Structured reporting is essential for improving the clarity and accuracy of radiological information. Despite its benefits, the European Society of Radiology notes that it is not widely adopted. For example, while structured reporting frameworks such as the Breast Imaging Reporting and Data System provide standardized terminology and classification for mammography findings, radiology reports still mostly comprise free-text sections. This variability complicates the systematic extraction of key clinical data. Moreover, manual structuring of reports is time-consuming and prone to inconsistencies. Recent advancements in large language models have shown promise for clinical information extraction by enabling models to understand contextual nuances in medical text. However, challenges such as domain adaptation, privacy concerns, and generalizability remain. To address these limitations, frame semantics offers an approach to information extraction grounded in computational linguistics, allowing a structured representation of clinically relevant concepts.

OBJECTIVE

This study explores the combination of Bidirectional Encoder Representations from Transformers (BERT) architecture with the linguistic concept of frame semantics to extract and normalize information from free-text mammography reports.

METHODS

After creating an annotated corpus of 210 German reports for fine-tuning, we generate several BERT model variants by applying 3 pretraining strategies to hospital data. Afterward, a fact extraction pipeline is built, comprising an extractive question-answering model and a sequence labeling model. We quantitatively evaluate all model variants using common evaluation metrics (model perplexity, Stanford Question Answering Dataset 2.0 [SQuAD_v2], seqeval) and perform a qualitative clinician evaluation of the entire pipeline on a manually generated synthetic dataset of 21 reports, as well as a comparison with a generative approach following best practice prompting techniques using the open-source Llama 3.3 model (Meta).

RESULTS

Our system is capable of extracting 14 fact types and 40 entities from the clinical findings section of mammography reports. Further pretraining on hospital data reduced model perplexity, although it did not significantly impact the 2 downstream tasks. We achieved average F-scores of 90.4% and 81% for question answering and sequence labeling, respectively (best pretraining strategy). Qualitative evaluation of the pipeline based on synthetic data shows an overall precision of 96.1% and 99.6% for facts and entities, respectively. In contrast, generative extraction shows an overall precision of 91.2% and 87.3% for facts and entities, respectively. Hallucinations and extraction inconsistencies were observed.

CONCLUSIONS

This study demonstrates that frame semantics provides a robust and interpretable framework for automating structured reporting. By leveraging frame semantics, the approach enables customizable information extraction and supports generalization to diverse radiological domains and clinical contexts with additional annotation efforts. Furthermore, the BERT-based model architecture allows for efficient, on-premise deployment, ensuring data privacy. Future research should focus on validating the model's generalizability across external datasets and different report types to ensure its broader applicability in clinical practice.

摘要

背景

结构化报告对于提高放射学信息的清晰度和准确性至关重要。尽管有诸多益处,但欧洲放射学会指出其尚未得到广泛应用。例如,虽然像乳腺影像报告和数据系统这样的结构化报告框架为乳腺钼靶检查结果提供了标准化术语和分类,但放射学报告仍大多由自由文本部分组成。这种可变性使得关键临床数据的系统提取变得复杂。此外,报告的手动结构化既耗时又容易出现不一致性。大语言模型的最新进展通过使模型能够理解医学文本中的上下文细微差别,在临床信息提取方面显示出了前景。然而,诸如领域适应、隐私问题和通用性等挑战仍然存在。为解决这些局限性,框架语义学提供了一种基于计算语言学的信息提取方法,允许对临床相关概念进行结构化表示。

目的

本研究探索将来自变换器的双向编码器表示(BERT)架构与框架语义学的语言概念相结合,以从自由文本乳腺钼靶报告中提取和规范化信息。

方法

在创建了一个包含210份德语报告的注释语料库用于微调之后,我们通过对医院数据应用3种预训练策略生成了几个BERT模型变体。之后,构建了一个事实提取管道,包括一个抽取式问答模型和一个序列标注模型。我们使用常见评估指标(模型困惑度、斯坦福问答数据集2.0 [SQuAD_v2]、seqeval)对所有模型变体进行定量评估,并在一个由21份报告组成的手动生成的合成数据集上对整个管道进行定性临床医生评估,以及与使用开源Llama 3.3模型(Meta)的遵循最佳实践提示技术的生成式方法进行比较。

结果

我们的系统能够从乳腺钼靶报告的临床发现部分提取14种事实类型和40个实体。在医院数据上进一步预训练降低了模型困惑度,尽管它对两个下游任务没有显著影响。对于问答和序列标注,我们分别取得了90.4%和81%的平均F分数(最佳预训练策略)。基于合成数据对管道的定性评估显示,事实和实体的总体精度分别为96.1%和99.6%。相比之下,生成式提取显示事实和实体的总体精度分别为91.2%和87.3%。观察到了幻觉和提取不一致的情况。

结论

本研究表明,框架语义学为自动化结构化报告提供了一个强大且可解释的框架。通过利用框架语义学,该方法能够实现可定制的信息提取,并通过额外的注释工作支持向不同放射学领域和临床环境的泛化。此外,基于BERT的模型架构允许进行高效的本地部署,确保数据隐私。未来的研究应侧重于验证模型在外部数据集和不同报告类型上的通用性,以确保其在临床实践中的更广泛适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0697/12064967/7f865ddc1092/jmir_v27i1e68427_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0697/12064967/4d2d0639da03/jmir_v27i1e68427_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0697/12064967/11a13caf6c6e/jmir_v27i1e68427_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0697/12064967/7f865ddc1092/jmir_v27i1e68427_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0697/12064967/4d2d0639da03/jmir_v27i1e68427_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0697/12064967/11a13caf6c6e/jmir_v27i1e68427_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0697/12064967/7f865ddc1092/jmir_v27i1e68427_fig3.jpg

相似文献

1
Enhancing Bidirectional Encoder Representations From Transformers (BERT) With Frame Semantics to Extract Clinically Relevant Information From German Mammography Reports: Algorithm Development and Validation.利用框架语义增强来自变换器的双向编码器表征(BERT)以从德国乳腺钼靶报告中提取临床相关信息:算法开发与验证
J Med Internet Res. 2025 Apr 25;27:e68427. doi: 10.2196/68427.
2
Information extraction from weakly structured radiological reports with natural language queries.利用自然语言查询从弱结构放射学报告中提取信息。
Eur Radiol. 2024 Jan;34(1):330-337. doi: 10.1007/s00330-023-09977-3. Epub 2023 Jul 28.
3
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面,确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。
Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.
4
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用:算法开发与验证。
J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.
5
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
6
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN(带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合)模型的医患对话多标签分类:命名实体研究
JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.
7
Reshaping free-text radiology notes into structured reports with generative question answering transformers.利用生成式问答变换模型将自由文本放射学报告改造成结构化报告。
Artif Intell Med. 2024 Aug;154:102924. doi: 10.1016/j.artmed.2024.102924. Epub 2024 Jun 26.
8
Automatic structuring of radiology reports with on-premise open-source large language models.使用本地开源大语言模型对放射学报告进行自动结构化处理。
Eur Radiol. 2025 Apr;35(4):2018-2029. doi: 10.1007/s00330-024-11074-y. Epub 2024 Oct 10.
9
Positional embeddings and zero-shot learning using BERT for molecular-property prediction.使用BERT进行位置嵌入和零样本学习以预测分子性质
J Cheminform. 2025 Feb 5;17(1):17. doi: 10.1186/s13321-025-00959-9.
10
CACER: Clinical concept Annotations for Cancer Events and Relations.CACER:癌症事件与关系的临床概念注释。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2583-2594. doi: 10.1093/jamia/ocae231.

本文引用的文献

1
Detecting hallucinations in large language models using semantic entropy.使用语义熵检测大型语言模型中的幻觉。
Nature. 2024 Jun;630(8017):625-630. doi: 10.1038/s41586-024-07421-0. Epub 2024 Jun 19.
2
Adapted large language models can outperform medical experts in clinical text summarization.经过改编的大型语言模型在临床文本总结方面的表现优于医学专家。
Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.
3
Bidirectional Encoder Representations from Transformers in Radiology: A Systematic Review of Natural Language Processing Applications.
基于 Transformer 的双向编码器表示在放射学中的应用:自然语言处理应用的系统评价。
J Am Coll Radiol. 2024 Jun;21(6):914-941. doi: 10.1016/j.jacr.2024.01.012. Epub 2024 Jan 30.
4
Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine.诊断推理提示揭示了医学中大型语言模型可解释性的潜力。
NPJ Digit Med. 2024 Jan 24;7(1):20. doi: 10.1038/s41746-024-01010-1.
5
Zero-shot information extraction from radiological reports using ChatGPT.使用 ChatGPT 从放射报告中进行零样本信息提取。
Int J Med Inform. 2024 Mar;183:105321. doi: 10.1016/j.ijmedinf.2023.105321. Epub 2023 Dec 21.
6
Large language model-based information extraction from free-text radiology reports: a scoping review protocol.基于大型语言模型的自由文本放射学报告信息提取:范围综述方案。
BMJ Open. 2023 Dec 9;13(12):e076865. doi: 10.1136/bmjopen-2023-076865.
7
ESR paper on structured reporting in radiology-update 2023.欧洲放射学会关于放射学结构化报告的论文——2023年更新版
Insights Imaging. 2023 Nov 23;14(1):199. doi: 10.1186/s13244-023-01560-0.
8
Extracting cancer concepts from clinical notes using natural language processing: a systematic review.使用自然语言处理从临床笔记中提取癌症概念:系统评价。
BMC Bioinformatics. 2023 Oct 29;24(1):405. doi: 10.1186/s12859-023-05480-0.
9
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
10
Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer.利用大语言模型开发提示,以从乳腺癌的病理学和超声报告中提取临床信息。
Radiat Oncol J. 2023 Sep;41(3):209-216. doi: 10.3857/roj.2023.00633. Epub 2023 Sep 21.