• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用字符级和实体级表示来增强基于Transformer的临床语义文本相似性模型的双向编码器表示:临床STS建模研究

Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study.

作者信息

Xiong Ying, Chen Shuai, Chen Qingcai, Yan Jun, Tang Buzhou

机构信息

Harbin Institute of Technology, Shenzhen, China.

Peng Cheng Laboratory, Shenzhen, China.

出版信息

JMIR Med Inform. 2020 Dec 29;8(12):e23357. doi: 10.2196/23357.

DOI:10.2196/23357
PMID:33372664
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7803475/
Abstract

BACKGROUND

With the popularity of electronic health records (EHRs), the quality of health care has been improved. However, there are also some problems caused by EHRs, such as the growing use of copy-and-paste and templates, resulting in EHRs of low quality in content. In order to minimize data redundancy in different documents, Harvard Medical School and Mayo Clinic organized a national natural language processing (NLP) clinical challenge (n2c2) on clinical semantic textual similarity (ClinicalSTS) in 2019. The task of this challenge is to compute the semantic similarity among clinical text snippets.

OBJECTIVE

In this study, we aim to investigate novel methods to model ClinicalSTS and analyze the results.

METHODS

We propose a semantically enhanced text matching model for the 2019 n2c2/Open Health NLP (OHNLP) challenge on ClinicalSTS. The model includes 3 representation modules to encode clinical text snippet pairs at different levels: (1) character-level representation module based on convolutional neural network (CNN) to tackle the out-of-vocabulary problem in NLP; (2) sentence-level representation module that adopts a pretrained language model bidirectional encoder representation from transformers (BERT) to encode clinical text snippet pairs; and (3) entity-level representation module to model clinical entity information in clinical text snippets. In the case of entity-level representation, we compare 2 methods. One encodes entities by the entity-type label sequence corresponding to text snippet (called entity I), whereas the other encodes entities by their representation in MeSH, a knowledge graph in the medical domain (called entity II).

RESULTS

We conduct experiments on the ClinicalSTS corpus of the 2019 n2c2/OHNLP challenge for model performance evaluation. The model only using BERT for text snippet pair encoding achieved a Pearson correlation coefficient (PCC) of 0.848. When character-level representation and entity-level representation are individually added into our model, the PCC increased to 0.857 and 0.854 (entity I)/0.859 (entity II), respectively. When both character-level representation and entity-level representation are added into our model, the PCC further increased to 0.861 (entity I) and 0.868 (entity II).

CONCLUSIONS

Experimental results show that both character-level information and entity-level information can effectively enhance the BERT-based STS model.

摘要

背景

随着电子健康记录(EHR)的普及,医疗保健质量得到了提高。然而,EHR也带来了一些问题,比如复制粘贴和模板的使用日益增多,导致EHR内容质量低下。为了尽量减少不同文档中的数据冗余,哈佛医学院和梅奥诊所于2019年组织了一场关于临床语义文本相似度(ClinicalSTS)的全国性自然语言处理(NLP)临床挑战赛(n2c2)。该挑战赛的任务是计算临床文本片段之间的语义相似度。

目的

在本研究中,我们旨在探索为ClinicalSTS建模的新方法并分析结果。

方法

我们为2019年n2c2/开放健康NLP(OHNLP)关于ClinicalSTS的挑战赛提出了一种语义增强的文本匹配模型。该模型包括3个表示模块,用于在不同层次上对临床文本片段对进行编码:(1)基于卷积神经网络(CNN)的字符级表示模块,以解决NLP中的词汇外问题;(2)句子级表示模块,采用预训练语言模型双向编码器表征来自变换器(BERT)对临床文本片段对进行编码;(3)实体级表示模块,用于对临床文本片段中的临床实体信息进行建模。在实体级表示的情况下,我们比较了两种方法。一种通过与文本片段对应的实体类型标签序列对实体进行编码(称为实体I),而另一种通过实体在医学领域知识图谱MeSH中的表示对实体进行编码(称为实体II)。

结果

我们在2019年n2c2/OHNLP挑战赛的ClinicalSTS语料库上进行实验以评估模型性能。仅使用BERT对文本片段对进行编码的模型的皮尔逊相关系数(PCC)为0.848。当将字符级表示和实体级表示分别添加到我们的模型中时,PCC分别提高到0.857和0.854(实体I)/0.859(实体II)。当同时将字符级表示和实体级表示添加到我们的模型中时,PCC进一步提高到0.861(实体I)和0.868(实体II)。

结论

实验结果表明,字符级信息和实体级信息都可以有效地增强基于BERT 的STS模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf6e/7803475/6e3e9648a5aa/medinform_v8i12e23357_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf6e/7803475/4efd3a7b3c34/medinform_v8i12e23357_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf6e/7803475/0589b3f3152a/medinform_v8i12e23357_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf6e/7803475/6e3e9648a5aa/medinform_v8i12e23357_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf6e/7803475/4efd3a7b3c34/medinform_v8i12e23357_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf6e/7803475/0589b3f3152a/medinform_v8i12e23357_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf6e/7803475/6e3e9648a5aa/medinform_v8i12e23357_fig3.jpg

相似文献

1
Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study.使用字符级和实体级表示来增强基于Transformer的临床语义文本相似性模型的双向编码器表示:临床STS建模研究
JMIR Med Inform. 2020 Dec 29;8(12):e23357. doi: 10.2196/23357.
2
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.2019年n2c2/OHNLP临床语义文本相似性赛道:概述
JMIR Med Inform. 2020 Nov 27;8(11):e23375. doi: 10.2196/23375.
3
Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning.临床笔记中语义相似句子的识别:使用多任务学习的迭代中间训练
JMIR Med Inform. 2020 Nov 27;8(11):e22508. doi: 10.2196/22508.
4
Incorporating Domain Knowledge Into Language Models by Using Graph Convolutional Networks for Assessing Semantic Textual Similarity: Model Development and Performance Comparison.通过使用图卷积网络将领域知识融入语言模型以评估语义文本相似度:模型开发与性能比较
JMIR Med Inform. 2021 Nov 26;9(11):e23101. doi: 10.2196/23101.
5
Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis.使用Transformer模型预测临床句子对之间的语义相似性:评估与表征分析
JMIR Med Inform. 2021 May 26;9(5):e23099. doi: 10.2196/23099.
6
Adapting Bidirectional Encoder Representations from Transformers (BERT) to Assess Clinical Semantic Textual Similarity: Algorithm Development and Validation Study.改编来自Transformer的双向编码器表征(BERT)以评估临床语义文本相似性:算法开发与验证研究。
JMIR Med Inform. 2021 Feb 3;9(2):e22795. doi: 10.2196/22795.
7
Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity.基于门控网络的分布式表示和独热表示融合用于临床语义文本相似度。
BMC Med Inform Decis Mak. 2020 Apr 30;20(Suppl 1):72. doi: 10.1186/s12911-020-1045-z.
8
Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models.临床文本中语义文本相似度的测量:基于Transformer模型的比较。
JMIR Med Inform. 2020 Nov 23;8(11):e19735. doi: 10.2196/19735.
9
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN(带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合)模型的医患对话多标签分类:命名实体研究
JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.
10
Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation.基于多语义特征,利用经过稳健优化的基于变换器预训练方法的全词掩码和卷积神经网络从电子病历中进行中文临床命名实体识别:模型开发与验证
JMIR Med Inform. 2023 May 10;11:e44597. doi: 10.2196/44597.

引用本文的文献

1
Language model and its interpretability in biomedicine: A scoping review.语言模型及其在生物医学中的可解释性:一项范围综述。
iScience. 2024 Feb 24;27(4):109334. doi: 10.1016/j.isci.2024.109334. eCollection 2024 Apr 19.
2
Impact of a Clinical Text-Based Fall Prediction Model on Preventing Extended Hospital Stays for Elderly Inpatients: Model Development and Performance Evaluation.基于临床文本的跌倒预测模型对预防老年住院患者延长住院时间的影响:模型开发与性能评估
JMIR Med Inform. 2022 Jul 27;10(7):e37913. doi: 10.2196/37913.

本文引用的文献

1
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.2019年n2c2/OHNLP临床语义文本相似性赛道:概述
JMIR Med Inform. 2020 Nov 27;8(11):e23375. doi: 10.2196/23375.
2
Cohort selection for clinical trials: n2c2 2018 shared task track 1.队列选择用于临床试验:n2c2 2018 共享任务赛道 1。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1163-1171. doi: 10.1093/jamia/ocz163.
3
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
4
SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research.SemEHR:一个通用的语义搜索系统,用于从临床记录中提取语义数据,以提供个性化护理、临床试验招募和临床研究。
J Am Med Inform Assoc. 2018 May 1;25(5):530-537. doi: 10.1093/jamia/ocx160.
5
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.
6
Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.电子健康记录搜索引擎基于语义的查询推荐的开发与以用户为中心的实证评估
J Biomed Inform. 2017 Mar;67:1-10. doi: 10.1016/j.jbi.2017.01.013. Epub 2017 Jan 25.
7
Electronic Health Records: Then, Now, and in the Future.电子健康记录:过去、现在与未来。
Yearb Med Inform. 2016 May 20;Suppl 1(Suppl 1):S48-61. doi: 10.15265/IYS-2016-s006.
8
Automatic de-identification of electronic medical records using token-level and character-level conditional random fields.使用令牌级和字符级条件随机场对电子病历进行自动去识别。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S47-S52. doi: 10.1016/j.jbi.2015.06.009. Epub 2015 Jun 26.
9
AskHERMES: An online question answering system for complex clinical questions.AskHERMES:一个用于复杂临床问题的在线问答系统。
J Biomed Inform. 2011 Apr;44(2):277-88. doi: 10.1016/j.jbi.2011.01.004. Epub 2011 Jan 21.
10
Copy and paste of electronic health records: a modern medical illness.电子健康记录的复制与粘贴:一种现代医学病症。
Am J Med. 2010 May;123(5):e9. doi: 10.1016/j.amjmed.2009.10.012.