• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一个BERT模型通过主动学习从病理摘要中生成与诊断相关的语义嵌入。

A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning.

作者信息

Mu Youqing, Tizhoosh Hamid R, Tayebi Rohollah Moosavi, Ross Catherine, Sur Monalisa, Leber Brian, Campbell Clinton J V

机构信息

McMaster University, Hamilton, ON Canada.

Kimia Lab, University of Waterloo, Waterloo, ON Canada.

出版信息

Commun Med (Lond). 2021 Jul 5;1:11. doi: 10.1038/s43856-021-00008-0. eCollection 2021.

DOI:10.1038/s43856-021-00008-0
PMID:35602188
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9053264/
Abstract

BACKGROUND

Pathology synopses consist of semi-structured or unstructured text summarizing visual information by observing human tissue. Experts write and interpret these synopses with high domain-specific knowledge to extract tissue semantics and formulate a diagnosis in the context of ancillary testing and clinical information. The limited number of specialists available to interpret pathology synopses restricts the utility of the inherent information. Deep learning offers a tool for information extraction and automatic feature generation from complex datasets.

METHODS

Using an active learning approach, we developed a set of semantic labels for bone marrow aspirate pathology synopses. We then trained a transformer-based deep-learning model to map these synopses to one or more semantic labels, and extracted learned embeddings (i.e., meaningful attributes) from the model's hidden layer.

RESULTS

Here we demonstrate that with a small amount of training data, a transformer-based natural language model can extract embeddings from pathology synopses that capture diagnostically relevant information. On average, these embeddings can be used to generate semantic labels mapping patients to probable diagnostic groups with a micro-average F1 score of 0.779 Â ± 0.025.

CONCLUSIONS

We provide a generalizable deep learning model and approach to unlock the semantic information inherent in pathology synopses toward improved diagnostics, biodiscovery and AI-assisted computational pathology.

摘要

背景

病理学概要由通过观察人体组织来总结视觉信息的半结构化或非结构化文本组成。专家凭借高度的领域特定知识撰写并解读这些概要,以提取组织语义并在辅助检测和临床信息的背景下做出诊断。能够解读病理学概要的专家数量有限,限制了固有信息的效用。深度学习提供了一种从复杂数据集中提取信息和自动生成特征的工具。

方法

我们采用主动学习方法,为骨髓穿刺病理学概要开发了一组语义标签。然后,我们训练了一个基于Transformer的深度学习模型,将这些概要映射到一个或多个语义标签,并从模型的隐藏层中提取学习到的嵌入(即有意义的属性)。

结果

在此我们证明,使用少量训练数据,基于Transformer的自然语言模型可以从病理学概要中提取能够捕获诊断相关信息的嵌入。平均而言,这些嵌入可用于生成将患者映射到可能诊断组的语义标签,微平均F1分数为0.779±0.025。

结论

我们提供了一种可推广的深度学习模型和方法,以解锁病理学概要中固有的语义信息,用于改进诊断、生物发现和人工智能辅助的计算病理学。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/c3e4bef7ed94/43856_2021_8_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/9de59063d4a4/43856_2021_8_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/71808df89f09/43856_2021_8_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/5d28efff5fec/43856_2021_8_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/0c669aa3d414/43856_2021_8_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/80bc0ec87737/43856_2021_8_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/c3e4bef7ed94/43856_2021_8_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/9de59063d4a4/43856_2021_8_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/71808df89f09/43856_2021_8_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/5d28efff5fec/43856_2021_8_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/0c669aa3d414/43856_2021_8_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/80bc0ec87737/43856_2021_8_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e09/9053264/c3e4bef7ed94/43856_2021_8_Fig6_HTML.jpg

相似文献

1
A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning.一个BERT模型通过主动学习从病理摘要中生成与诊断相关的语义嵌入。
Commun Med (Lond). 2021 Jul 5;1:11. doi: 10.1038/s43856-021-00008-0. eCollection 2021.
2
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
3
Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis.使用Transformer模型预测临床句子对之间的语义相似性:评估与表征分析
JMIR Med Inform. 2021 May 26;9(5):e23099. doi: 10.2196/23099.
4
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
5
exKidneyBERT: a language model for kidney transplant pathology reports and the crucial role of extended vocabularies.exKidneyBERT:一种用于肾移植病理报告的语言模型及扩展词汇表的关键作用。
PeerJ Comput Sci. 2024 Feb 28;10:e1888. doi: 10.7717/peerj-cs.1888. eCollection 2024.
6
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.
7
Generating contextual embeddings for emergency department chief complaints.为急诊科主要症状生成上下文嵌入。
JAMIA Open. 2020 Jul 15;3(2):160-166. doi: 10.1093/jamiaopen/ooaa022. eCollection 2020 Jul.
8
Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.多本体精炼嵌入模型(MORE):一种基于混合多本体和语料库的生物医学概念语义表示模型。
J Biomed Inform. 2020 Nov;111:103581. doi: 10.1016/j.jbi.2020.103581. Epub 2020 Oct 1.
9
Pretrained Transformer Language Models Versus Pretrained Word Embeddings for the Detection of Accurate Health Information on Arabic Social Media: Comparative Study.用于在阿拉伯社交媒体上检测准确健康信息的预训练Transformer语言模型与预训练词嵌入:比较研究
JMIR Form Res. 2022 Jun 29;6(6):e34834. doi: 10.2196/34834.
10
Simulating doctors' thinking logic for chest X-ray report generation via Transformer-based Semantic Query learning.通过基于Transformer的语义查询学习模拟医生生成胸部X光报告的思维逻辑。
Med Image Anal. 2024 Jan;91:102982. doi: 10.1016/j.media.2023.102982. Epub 2023 Sep 29.

引用本文的文献

1
From large language models to multimodal AI: a scoping review on the potential of generative AI in medicine.从大语言模型到多模态人工智能:关于生成式人工智能在医学领域潜力的范围综述
Biomed Eng Lett. 2025 Aug 22;15(5):845-863. doi: 10.1007/s13534-025-00497-1. eCollection 2025 Sep.
2
A novel dual embedding few-shot learning approach for classifying bone loss using orthopantomogram radiographic notes.一种用于使用全景X线片影像学记录对骨质流失进行分类的新型双嵌入少样本学习方法。
Head Face Med. 2025 Jul 11;21(1):49. doi: 10.1186/s13005-025-00528-3.
3
Applications of Large Language Models in Pathology.

本文引用的文献

1
Text Data Augmentation for Deep Learning.用于深度学习的文本数据增强
J Big Data. 2021;8(1):101. doi: 10.1186/s40537-021-00492-0. Epub 2021 Jul 19.
2
The future of pathology is digital.病理学的未来是数字化的。
Pathol Res Pract. 2020 Sep;216(9):153040. doi: 10.1016/j.prp.2020.153040. Epub 2020 Jun 20.
3
A Transparent and Adaptable Method to Extract Colonoscopy and Pathology Data Using Natural Language Processing.一种使用自然语言处理提取结肠镜检查和病理学数据的透明且可适应的方法。
大语言模型在病理学中的应用。
Bioengineering (Basel). 2024 Mar 31;11(4):342. doi: 10.3390/bioengineering11040342.
4
Computational pathology: A survey review and the way forward.计算病理学:综述与未来发展方向
J Pathol Inform. 2024 Jan 14;15:100357. doi: 10.1016/j.jpi.2023.100357. eCollection 2024 Dec.
5
Model-Agnostic Binary Patch Grouping for Bone Marrow Whole Slide Image Representation.基于模型无关二进制斑块分组的骨髓全切片图像表示方法。
Am J Pathol. 2024 May;194(5):721-734. doi: 10.1016/j.ajpath.2024.01.012. Epub 2024 Feb 5.
6
Use of n-grams and K-means clustering to classify data from free text bone marrow reports.使用n元语法和K均值聚类对来自骨髓检查自由文本报告的数据进行分类。
J Pathol Inform. 2024 Jan 4;15:100358. doi: 10.1016/j.jpi.2023.100358. eCollection 2024 Dec.
7
Automatic symptoms identification from a massive volume of unstructured medical consultations using deep neural and BERT models.使用深度神经网络和BERT模型从大量非结构化医疗咨询中自动识别症状
Heliyon. 2022 Jun 10;8(6):e09683. doi: 10.1016/j.heliyon.2022.e09683. eCollection 2022 Jun.
J Med Syst. 2020 Jul 31;44(9):151. doi: 10.1007/s10916-020-01604-8.
4
Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine.开发具有多功能机器学习平台的人工智能,以实现更优质的医疗保健和精准医疗。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa010.
5
Preparing Medical Imaging Data for Machine Learning.医学影像数据的机器学习准备
Radiology. 2020 Apr;295(1):4-15. doi: 10.1148/radiol.2020192224. Epub 2020 Feb 18.
6
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
7
Why deep-learning AIs are so easy to fool.为何深度学习人工智能如此容易被欺骗。
Nature. 2019 Oct;574(7777):163-166. doi: 10.1038/d41586-019-03013-5.
8
Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods.病理学中的人工智能与机器学习:监督方法的现状
Acad Pathol. 2019 Sep 3;6:2374289519873088. doi: 10.1177/2374289519873088. eCollection 2019 Jan-Dec.
9
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
10
Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research.为前列腺癌临床护理与研究自动采集结构化病理数据
JCO Clin Cancer Inform. 2019 Jul;3:1-8. doi: 10.1200/CCI.18.00084.