• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

半监督医学实体识别:关于西班牙语和瑞典语临床语料库的研究

Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora.

作者信息

Pérez Alicia, Weegar Rebecka, Casillas Arantza, Gojenola Koldo, Oronoz Maite, Dalianis Hercules

机构信息

IXA Group, University of the Basque Country (UPV-EHU), Spain(1).

Clinical Text Mining Group, Department of Computer and System Sciences (DSV), Stockholm University, Sweden.

出版信息

J Biomed Inform. 2017 Jul;71:16-30. doi: 10.1016/j.jbi.2017.05.009. Epub 2017 May 16.

DOI:10.1016/j.jbi.2017.05.009
PMID:28526460
Abstract

OBJECTIVE

The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations.

METHODS

The significance of this work stands on its experimental layout. The experiments were carried out under the same conditions for both languages. Several classification approaches were explored: maximum probability, CRF, Perceptron and SVM. The classifiers were enhanced by means of ensembles of semantic spaces and ensembles of Brown trees. In order to mitigate sparsity of data, without a significant increase in the dimension of the decision space, we propose the use of clustered approaches of the hierarchical Brown clustering represented by trees and vector quantization for each semantic space.

RESULTS

The results showed that the semi-supervised approaches significantly improved standard supervised techniques for both languages. Moreover, clustering the semantic spaces contributed to the quality of the entity recognition while keeping the dimension of the feature-space two orders of magnitude lower than when directly using the semantic spaces.

CONCLUSIONS

The contributions of this study are: (a) a set of thorough experiments that enable comparisons regarding the influence of different types of features on different classifiers, exploring two languages other than English; and (b) the use of ensembles of clusters of Brown trees and semantic spaces on EHRs to tackle the problem of scarcity of available annotated data.

摘要

目的

本研究的目标是调查电子健康记录(EHR)中的实体识别,重点关注西班牙语和瑞典语。实体的强大表示尤为重要。在我们的案例中,我们使用无监督方法来生成此类表示。

方法

这项工作的重要性在于其实验布局。两种语言的实验均在相同条件下进行。探索了几种分类方法:最大概率法、条件随机场(CRF)、感知机和支持向量机(SVM)。通过语义空间集合和布朗树集合增强分类器。为了减轻数据稀疏性,在不显著增加决策空间维度的情况下,我们建议对由树表示的分层布朗聚类和每个语义空间的向量量化使用聚类方法。

结果

结果表明,半监督方法显著改进了两种语言的标准监督技术。此外,对语义空间进行聚类有助于实体识别的质量,同时使特征空间的维度比直接使用语义空间时低两个数量级。

结论

本研究的贡献在于:(a)一组全面的实验,能够比较不同类型特征对不同分类器的影响,探索了英语以外的两种语言;(b)在电子健康记录上使用布朗树聚类和语义空间集合来解决可用注释数据稀缺的问题。

相似文献

1
Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora.半监督医学实体识别:关于西班牙语和瑞典语临床语料库的研究
J Biomed Inform. 2017 Jul;71:16-30. doi: 10.1016/j.jbi.2017.05.009. Epub 2017 May 16.
2
Ensembles of randomized trees using diverse distributed representations of clinical events.使用临床事件的多种分布式表示的随机树集成。
BMC Med Inform Decis Mak. 2016 Jul 21;16 Suppl 2(Suppl 2):69. doi: 10.1186/s12911-016-0309-0.
3
Measuring the effect of different types of unsupervised word representations on Medical Named Entity Recognition.测量不同类型无监督词表示方法对医学命名实体识别的影响。
Int J Med Inform. 2019 Sep;129:100-106. doi: 10.1016/j.ijmedinf.2019.05.022. Epub 2019 Jun 5.
4
Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches.最近在使用深度神经网络方法识别瑞典语和西班牙语临床文本中的医学实体方面取得了进展。
BMC Med Inform Decis Mak. 2019 Dec 23;19(Suppl 7):274. doi: 10.1186/s12911-019-0981-y.
5
Unsupervised entity and relation extraction from clinical records in Italian.从意大利语临床记录中进行无监督实体和关系提取。
Comput Biol Med. 2016 May 1;72:263-75. doi: 10.1016/j.compbiomed.2016.01.014. Epub 2016 Jan 23.
6
Identifying adverse drug event information in clinical notes with distributional semantic representations of context.利用上下文的分布语义表示识别临床记录中的药物不良事件信息。
J Biomed Inform. 2015 Oct;57:333-49. doi: 10.1016/j.jbi.2015.08.013. Epub 2015 Aug 17.
7
Improving clinical named entity recognition in Chinese using the graphical and phonetic feature.利用图形和语音特征提高中文临床命名实体识别
BMC Med Inform Decis Mak. 2019 Dec 23;19(Suppl 7):273. doi: 10.1186/s12911-019-0980-z.
8
A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories.一项关于机器学习方法将临床访谈片段分类为大量类别的有效性研究。
J Biomed Inform. 2016 Aug;62:21-31. doi: 10.1016/j.jbi.2016.05.004. Epub 2016 May 13.
9
Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation.用于中文医学实体识别的多层次表示学习:模型开发与验证
JMIR Med Inform. 2020 May 4;8(5):e17637. doi: 10.2196/17637.
10
Finding Cervical Cancer Symptoms in Swedish Clinical Text using a Machine Learning Approach and NegEx.使用机器学习方法和NegEx在瑞典语临床文本中发现宫颈癌症状
AMIA Annu Symp Proc. 2015 Nov 5;2015:1296-305. eCollection 2015.

引用本文的文献

1
Multi-head CRF classifier for biomedical multi-class named entity recognition on Spanish clinical notes.基于多头条件随机场分类器的西班牙语临床文档中生物医学多类命名实体识别。
Database (Oxford). 2024 Jul 30;2024. doi: 10.1093/database/baae068.
2
Advances in monolingual and crosslingual automatic disability annotation in Spanish.西班牙语中单语和跨语言自动残疾标注的进展。
BMC Bioinformatics. 2023 Jun 26;24(1):265. doi: 10.1186/s12859-023-05372-3.
3
Evaluation of Natural Language Processing for the Identification of Crohn Disease-Related Variables in Spanish Electronic Health Records: A Validation Study for the PREMONITION-CD Project.
西班牙语电子健康记录中用于识别克罗恩病相关变量的自然语言处理评估:PREMONITION-CD项目的验证研究
JMIR Med Inform. 2022 Feb 18;10(2):e30345. doi: 10.2196/30345.
4
Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches.最近在使用深度神经网络方法识别瑞典语和西班牙语临床文本中的医学实体方面取得了进展。
BMC Med Inform Decis Mak. 2019 Dec 23;19(Suppl 7):274. doi: 10.1186/s12911-019-0981-y.
5
Expanding the Diversity of Texts and Applications: Findings from the Section on Clinical Natural Language Processing of the International Medical Informatics Association Yearbook.拓展文本与应用的多样性:国际医学信息学协会年鉴临床自然语言处理章节的研究发现
Yearb Med Inform. 2018 Aug;27(1):193-198. doi: 10.1055/s-0038-1667080. Epub 2018 Aug 29.