• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于ALBERT的自集成模型,结合半监督学习和数据增强用于临床语义文本相似度计算:算法验证研究

ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study.

作者信息

Li Junyi, Zhang Xuejie, Zhou Xiaobing

机构信息

School of Information Science and Engineering, Yunnan University, Kunming, China.

出版信息

JMIR Med Inform. 2021 Jan 22;9(1):e23086. doi: 10.2196/23086.

DOI:10.2196/23086
PMID:33480858
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7864778/
Abstract

BACKGROUND

In recent years, with increases in the amount of information available and the importance of information screening, increased attention has been paid to the calculation of textual semantic similarity. In the field of medicine, electronic medical records and medical research documents have become important data resources for clinical research. Medical textual semantic similarity calculation has become an urgent problem to be solved.

OBJECTIVE

This research aims to solve 2 problems-(1) when the size of medical data sets is small, leading to insufficient learning with understanding of the models and (2) when information is lost in the process of long-distance propagation, causing the models to be unable to grasp key information.

METHODS

This paper combines a text data augmentation method and a self-ensemble ALBERT model under semisupervised learning to perform clinical textual semantic similarity calculations.

RESULTS

Compared with the methods in the 2019 National Natural Language Processing Clinical Challenges Open Health Natural Language Processing shared task Track on Clinical Semantic Textual Similarity, our method surpasses the best result by 2 percentage points and achieves a Pearson correlation coefficient of 0.92.

CONCLUSIONS

When the size of medical data set is small, data augmentation can increase the size of the data set and improved semisupervised learning can boost the learning efficiency of the model. Additionally, self-ensemble methods improve the model performance. Our method had excellent performance and has great potential to improve related medical problems.

摘要

背景

近年来,随着可用信息量的增加以及信息筛选的重要性,文本语义相似度计算受到了越来越多的关注。在医学领域,电子病历和医学研究文档已成为临床研究的重要数据资源。医学文本语义相似度计算已成为亟待解决的问题。

目的

本研究旨在解决两个问题——(1)当医学数据集规模较小时,导致模型学习理解不足;(2)当信息在长距离传播过程中丢失时,导致模型无法把握关键信息。

方法

本文在半监督学习下结合文本数据增强方法和自集成ALBERT模型进行临床文本语义相似度计算。

结果

与2019年全国自然语言处理临床挑战开放健康自然语言处理共享任务临床语义文本相似度赛道中的方法相比,我们的方法比最佳结果高出2个百分点,皮尔逊相关系数达到0.92。

结论

当医学数据集规模较小时,数据增强可以增加数据集规模,改进的半监督学习可以提高模型的学习效率。此外,自集成方法可提升模型性能。我们的方法具有优异的性能,在改善相关医学问题方面具有巨大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b889/7864778/25cf5f1e5af7/medinform_v9i1e23086_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b889/7864778/588f875788c3/medinform_v9i1e23086_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b889/7864778/66fa2711829c/medinform_v9i1e23086_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b889/7864778/a9f93d775ec5/medinform_v9i1e23086_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b889/7864778/25cf5f1e5af7/medinform_v9i1e23086_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b889/7864778/588f875788c3/medinform_v9i1e23086_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b889/7864778/66fa2711829c/medinform_v9i1e23086_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b889/7864778/a9f93d775ec5/medinform_v9i1e23086_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b889/7864778/25cf5f1e5af7/medinform_v9i1e23086_fig4.jpg

相似文献

1
ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study.基于ALBERT的自集成模型,结合半监督学习和数据增强用于临床语义文本相似度计算:算法验证研究
JMIR Med Inform. 2021 Jan 22;9(1):e23086. doi: 10.2196/23086.
2
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.2019年n2c2/OHNLP临床语义文本相似性赛道:概述
JMIR Med Inform. 2020 Nov 27;8(11):e23375. doi: 10.2196/23375.
3
Incorporating Domain Knowledge Into Language Models by Using Graph Convolutional Networks for Assessing Semantic Textual Similarity: Model Development and Performance Comparison.通过使用图卷积网络将领域知识融入语言模型以评估语义文本相似度:模型开发与性能比较
JMIR Med Inform. 2021 Nov 26;9(11):e23101. doi: 10.2196/23101.
4
Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning.临床笔记中语义相似句子的识别:使用多任务学习的迭代中间训练
JMIR Med Inform. 2020 Nov 27;8(11):e22508. doi: 10.2196/22508.
5
Benchmarking Effectiveness and Efficiency of Deep Learning Models for Semantic Textual Similarity in the Clinical Domain: Validation Study.临床领域语义文本相似度深度学习模型的有效性和效率基准测试:验证研究
JMIR Med Inform. 2021 Dec 30;9(12):e27386. doi: 10.2196/27386.
6
Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis.使用Transformer模型预测临床句子对之间的语义相似性:评估与表征分析
JMIR Med Inform. 2021 May 26;9(5):e23099. doi: 10.2196/23099.
7
Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study.使用字符级和实体级表示来增强基于Transformer的临床语义文本相似性模型的双向编码器表示:临床STS建模研究
JMIR Med Inform. 2020 Dec 29;8(12):e23357. doi: 10.2196/23357.
8
Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models.临床文本中语义文本相似度的测量:基于Transformer模型的比较。
JMIR Med Inform. 2020 Nov 23;8(11):e19735. doi: 10.2196/19735.
9
Adapting Bidirectional Encoder Representations from Transformers (BERT) to Assess Clinical Semantic Textual Similarity: Algorithm Development and Validation Study.改编来自Transformer的双向编码器表征(BERT)以评估临床语义文本相似性:算法开发与验证研究。
JMIR Med Inform. 2021 Feb 3;9(2):e22795. doi: 10.2196/22795.
10
Path-based knowledge reasoning with textual semantic information for medical knowledge graph completion.基于路径的知识推理与文本语义信息融合的医疗知识图谱补全方法
BMC Med Inform Decis Mak. 2021 Nov 29;21(Suppl 9):335. doi: 10.1186/s12911-021-01622-7.

引用本文的文献

1
BERT-Based Neural Network for Inpatient Fall Detection From Electronic Medical Records: Retrospective Cohort Study.基于BERT的神经网络用于从电子病历中检测住院患者跌倒:回顾性队列研究
JMIR Med Inform. 2024 Jan 30;12:e48995. doi: 10.2196/48995.
2
Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review.将自然语言处理应用于临床数据仓库中的文本数据:系统评价。
JMIR Med Inform. 2023 Dec 15;11:e42477. doi: 10.2196/42477.
3
Identifying infected patients using semi-supervised and transfer learning.使用半监督和迁移学习识别感染患者。

本文引用的文献

1
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.2019年n2c2/OHNLP临床语义文本相似性赛道:概述
JMIR Med Inform. 2020 Nov 27;8(11):e23375. doi: 10.2196/23375.
2
Categorization of Third-Party Apps in Electronic Health Record App Marketplaces: Systematic Search and Analysis.电子健康记录应用程序市场中第三方应用程序的分类:系统检索与分析
JMIR Med Inform. 2020 May 29;8(5):e16980. doi: 10.2196/16980.
3
Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity.
J Am Med Inform Assoc. 2022 Sep 12;29(10):1696-1704. doi: 10.1093/jamia/ocac109.
4
An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering.一种用于大规模生物医学文档聚类的基于有效并行化本体网络的语义相似度度量方法。
Comput Math Methods Med. 2021 Nov 9;2021:7937573. doi: 10.1155/2021/7937573. eCollection 2021.
基于门控网络的分布式表示和独热表示融合用于临床语义文本相似度。
BMC Med Inform Decis Mak. 2020 Apr 30;20(Suppl 1):72. doi: 10.1186/s12911-020-1045-z.
4
A Gated Dilated Convolution with Attention Model for Clinical Cloze-Style Reading Comprehension.门控扩张卷积注意力模型在临床完形填空式阅读理解中的应用。
Int J Environ Res Public Health. 2020 Feb 19;17(4):1323. doi: 10.3390/ijerph17041323.
5
Semi Supervised Learning with Deep Embedded Clustering for Image Classification and Segmentation.用于图像分类和分割的深度嵌入聚类半监督学习
IEEE Access. 2019;7:11093-11104. doi: 10.1109/ACCESS.2019.2891970. Epub 2019 Jan 9.
6
Classification of lung nodules in CT scans using three-dimensional deep convolutional neural networks with a checkpoint ensemble method.使用带有检查点集成方法的三维深度卷积神经网络对CT扫描中的肺结节进行分类。
BMC Med Imaging. 2018 Dec 3;18(1):48. doi: 10.1186/s12880-018-0286-0.
7
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
8
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning.用于计算机辅助检测的深度卷积神经网络:卷积神经网络架构、数据集特征与迁移学习
IEEE Trans Med Imaging. 2016 May;35(5):1285-98. doi: 10.1109/TMI.2016.2528162. Epub 2016 Feb 11.
9
Semi-supervised and unsupervised extreme learning machines.半监督和无监督极限学习机。
IEEE Trans Cybern. 2014 Dec;44(12):2405-17. doi: 10.1109/TCYB.2014.2307349.
10
The reliability of a two-item scale: Pearson, Cronbach, or Spearman-Brown?双项量表的信度:皮尔逊、克伦巴赫还是斯皮尔曼-布朗?
Int J Public Health. 2013 Aug;58(4):637-42. doi: 10.1007/s00038-012-0416-3. Epub 2012 Oct 23.