文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Expansion of medical vocabularies using distributional semantics on Japanese patient blogs.

作者信息

Ahltorp Magnus, Skeppstedt Maria, Kitajima Shiho, Henriksson Aron, Rzepka Rafal, Araki Kenji

机构信息

, Stockholm, Sweden.

Department of Computer Science, Linnaeus University/Gavagai, Växjö/Stockholm, Sweden.

出版信息

J Biomed Semantics. 2016 Sep 26;7(1):58. doi: 10.1186/s13326-016-0093-x.


DOI:10.1186/s13326-016-0093-x
PMID:27671202
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5037651/
Abstract

BACKGROUND: Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in most languages. Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English and belonging to a variety of medical genres. The aim of this study was therefore to evaluate medical vocabulary expansion using a corpus very different from those previously used, in terms of grammar and orthographics, as well as in terms of text genre. This was carried out by applying a method based on distributional semantics to the task of extracting medical vocabulary terms from a large corpus of Japanese patient blogs. METHODS: Distributional properties of terms were modelled with random indexing, followed by agglomerative hierarchical clustering of 3 ×100 seed terms from existing vocabularies, belonging to three semantic categories: Medical Finding, Pharmaceutical Drug and Body Part. By automatically extracting unknown terms close to the centroids of the created clusters, candidates for new terms to include in the vocabulary were suggested. The method was evaluated for its ability to retrieve the remaining n terms in existing medical vocabularies. RESULTS: Removing case particles and using a context window size of 1+1 was a successful strategy for Medical Finding and Pharmaceutical Drug, while retaining case particles and using a window size of 8+8 was better for Body Part. For a 10n long candidate list, the use of different cluster sizes affected the result for Pharmaceutical Drug, while the effect was only marginal for the other two categories. For a list of top n candidates for Body Part, however, clusters with a size of up to two terms were slightly more useful than larger clusters. For Pharmaceutical Drug, the best settings resulted in a recall of 25 % for a candidate list of top n terms and a recall of 68 % for top 10n. For a candidate list of top 10n candidates, the second best results were obtained for Medical Finding: a recall of 58 %, compared to 46 % for Body Part. Only taking the top n candidates into account, however, resulted in a recall of 23 % for Body Part, compared to 16 % for Medical Finding. CONCLUSIONS: Different settings for corpus pre-processing, window sizes and cluster sizes were suitable for different semantic categories and for different lengths of candidate lists, showing the need to adapt parameters, not only to the language and text genre used, but also to the semantic category for which the vocabulary is to be expanded. The results show, however, that the investigated choices for pre-processing and parameter settings were successful, and that a Japanese blog corpus, which in many ways differs from those used in previous studies, can be a useful resource for medical vocabulary expansion.

摘要

相似文献

[1]
Expansion of medical vocabularies using distributional semantics on Japanese patient blogs.

J Biomed Semantics. 2016-9-26

[2]
Synonym extraction and abbreviation expansion with ensembles of semantic spaces.

J Biomed Semantics. 2014-2-5

[3]
Corpus domain effects on distributional semantic modeling of medical terms.

Bioinformatics. 2016-12-1

[4]
Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.

J Biomed Inform. 2014-6

[5]
Consumers' Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites.

JMIR Med Inform. 2016-11-24

[6]
Automatic extraction of candidate nomenclature terms using the doublet method.

BMC Med Inform Decis Mak. 2005-10-18

[7]
The Semantic Organization of the English Odor Vocabulary.

Cogn Sci. 2022-11

[8]
Expanding a radiology lexicon using contextual patterns in radiology reports.

J Am Med Inform Assoc. 2018-6-1

[9]
The role of corpus size and syntax in deriving lexico-semantic representations for a wide range of concepts.

Q J Exp Psychol (Hove). 2015

[10]
Improving Consumer Understanding of Medical Text: Development and Validation of a New SubSimplify Algorithm to Automatically Generate Term Explanations in English and Spanish.

J Med Internet Res. 2018-8-2

引用本文的文献

[1]
An empirical study on the teaching mode of cultural translation in college English based on the Production Oriented Approach (POA).

PLoS One. 2025-6-27

[2]
An Alternative Application of Natural Language Processing to Express a Characteristic Feature of Diseases in Japanese Medical Records.

Methods Inf Med. 2023-9

[3]
MedLexSp - a medical lexicon for Spanish medical natural language processing.

J Biomed Semantics. 2023-2-2

[4]
Affective Cognition of Students' Autonomous Learning in College English Teaching Based on Deep Learning.

Front Psychol. 2022-1-19

[5]
Learning unsupervised contextual representations for medical synonym discovery.

JAMIA Open. 2019-11-4

[6]
Clinical Natural Language Processing in languages other than English: opportunities and challenges.

J Biomed Semantics. 2018-3-30

[7]
A Text Structuring Method for Chinese Medical Text Based on Temporal Information.

Int J Environ Res Public Health. 2018-2-27

[8]
Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach.

JMIR Med Inform. 2017-10-31

本文引用的文献

[1]
Identifying adverse drug event information in clinical notes with distributional semantic representations of context.

J Biomed Inform. 2015-10

[2]
Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.

J Am Med Inform Assoc. 2015-5

[3]
Identifying synonymy between SNOMED clinical terms of varying length using distributional analysis of electronic health records.

AMIA Annu Symp Proc. 2013-11-16

[4]
Synonym extraction and abbreviation expansion with ensembles of semantic spaces.

J Biomed Semantics. 2014-2-5

[5]
Unsupervised biomedical named entity recognition: experiments with clinical and biological texts.

J Biomed Inform. 2013-8-15

[6]
Towards comprehensive syntactic and semantic annotations of the clinical narrative.

J Am Med Inform Assoc. 2013-1-25

[7]
Improving perceived and actual text difficulty for health information consumers using semi-automated methods.

AMIA Annu Symp Proc. 2012

[8]
Landscape of international event-based biosurveillance.

Emerg Health Threats J. 2010

[9]
Enhancing clinical concept extraction with distributional semantics.

J Biomed Inform. 2011-11-7

[10]
Using electronic patient records to discover disease correlations and stratify patient cohorts.

PLoS Comput Biol. 2011-8-25

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索