• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种使用GloVe词嵌入和辅助词汇资源来丰富消费者健康词汇表的自动化方法。

An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource.

作者信息

Ibrahim Mohammed, Gauch Susan, Salman Omar, Alqahtani Mohammed

机构信息

Computer Science and Computer Engineering, University of Arkansas at Fayetteville, Fayetteville, AR, United States.

出版信息

PeerJ Comput Sci. 2021 Aug 9;7:e668. doi: 10.7717/peerj-cs.668. eCollection 2021.

DOI:10.7717/peerj-cs.668
PMID:34458573
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8371999/
Abstract

BACKGROUND

Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical terminology which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa.

OBJECTIVE

Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this paper, we present an automatic method to enrich laymen's vocabularies that has the benefit of being able to be applied to vocabularies in any domain.

METHODS

Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies. Our approach further improves the consumer health vocabularies by incorporating synonyms and hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms incorporating WordNet were evaluated using two laymen datasets from the National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC CHV) and MedlinePlus Healthcare Vocabulary.

RESULTS

The results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed basic GloVe with an average F-score of 61%, a relative improvement of 25%. Furthermore, the enhanced GloVe showed a statistical significance over the two ground truth datasets with < 0.001.

CONCLUSIONS

This paper presents an automatic approach to enrich consumer health vocabularies using the GloVe word embeddings and an auxiliary lexical source, WordNet. Our approach was evaluated used healthcare text downloaded from , a healthcare social media platform using two standard laymen vocabularies, OAC CHV, and MedlinePlus. We used the WordNet ontology to expand the healthcare corpus by including synonyms, hyponyms, and hypernyms for each layman term occurrence in the corpus. Given a seed term selected from a concept in the ontology, we measured our algorithms' ability to automatically extract synonyms for those terms that appeared in the ground truth concept. We found that enhanced GloVe outperformed GloVe with a relative improvement of 25% in the F-score.

摘要

背景

清晰的语言能使任意两方之间的交流更顺畅。由于外行人不理解某领域的专业术语,他们可能难以与专业人士进行沟通。在医疗保健领域,很少能找到熟悉医学术语的外行人,这可能导致他们对自身病情和/或治疗的理解不足。为了弥合这一差距,人们创建了一些专业词汇表和本体,用于将外行人的医学术语映射到专业医学术语,反之亦然。

目的

目前呈现的许多词汇表是通过人工或半自动方式构建的,这需要投入大量时间和人力,导致这些词汇表的增长缓慢。在本文中,我们提出一种自动方法来丰富外行人的词汇表,该方法的优点是能够应用于任何领域的词汇表。

方法

我们完全自动的方法在从社交媒体医疗平台收集的语料库上使用机器学习,特别是词向量全局向量(GloVe),来扩展和增强消费者健康词汇表。我们的方法通过纳入来自WordNet本体的同义词和下位词,进一步改进了消费者健康词汇表。使用来自美国国立医学图书馆(NLM)的两个外行人数据集、开放获取消费者健康词汇表(OAC CHV)和MedlinePlus医疗词汇表,对基本的GloVe和我们纳入WordNet的新算法进行了评估。

结果

结果表明,GloVe能够找到新的外行人术语,F值为48.44%。此外,我们改进后的GloVe方法表现优于基本的GloVe,平均F值为61%,相对提高了25%。此外,改进后的GloVe在两个基准数据集上具有统计学意义,P < 0.001。

结论

本文提出了一种使用GloVe词向量和辅助词汇源WordNet来丰富消费者健康词汇表的自动方法。我们的方法使用从一个医疗社交媒体平台下载的医疗文本,通过两个标准的外行人词汇表OAC CHV和MedlinePlus进行评估。我们使用WordNet本体,通过为语料库中出现的每个外行人术语纳入同义词、下位词和上位词,来扩展医疗语料库。给定从本体中的一个概念选择的种子术语,我们测量了我们的算法自动提取出现在基准概念中的那些术语的同义词的能力。我们发现,改进后的GloVe在F值上表现优于GloVe,相对提高了25%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/b82415e98e9a/peerj-cs-07-668-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/48f0cefd54b9/peerj-cs-07-668-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/93765d85ad64/peerj-cs-07-668-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/60f78f6f69ac/peerj-cs-07-668-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/f713f43929a3/peerj-cs-07-668-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/6f8268ded7a4/peerj-cs-07-668-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/3a16d686f361/peerj-cs-07-668-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/b82415e98e9a/peerj-cs-07-668-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/48f0cefd54b9/peerj-cs-07-668-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/93765d85ad64/peerj-cs-07-668-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/60f78f6f69ac/peerj-cs-07-668-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/f713f43929a3/peerj-cs-07-668-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/6f8268ded7a4/peerj-cs-07-668-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/3a16d686f361/peerj-cs-07-668-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/b82415e98e9a/peerj-cs-07-668-g007.jpg

相似文献

1
An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource.一种使用GloVe词嵌入和辅助词汇资源来丰富消费者健康词汇表的自动化方法。
PeerJ Comput Sci. 2021 Aug 9;7:e668. doi: 10.7717/peerj-cs.668. eCollection 2021.
2
Improving Consumer Understanding of Medical Text: Development and Validation of a New SubSimplify Algorithm to Automatically Generate Term Explanations in English and Spanish.提高消费者对医学文本的理解:一种用于自动生成英语和西班牙语术语解释的新型SubSimplify算法的开发与验证
J Med Internet Res. 2018 Aug 2;20(8):e10779. doi: 10.2196/10779.
3
Enriching consumer health vocabulary through mining a social Q&A site: A similarity-based approach.通过挖掘一个社交问答网站丰富消费者健康词汇:一种基于相似度的方法。
J Biomed Inform. 2017 May;69:75-85. doi: 10.1016/j.jbi.2017.03.016. Epub 2017 Mar 27.
4
Development of a Consumer Health Vocabulary by Mining Health Forum Texts Based on Word Embedding: Semiautomatic Approach.基于词嵌入挖掘健康论坛文本开发消费者健康词汇表:半自动方法
JMIR Med Inform. 2019 May 23;7(2):e12704. doi: 10.2196/12704.
5
Consumers' Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites.消费者在社交媒体上对统一医学语言系统(UMLS)概念的使用:博客和社交问答网站中与糖尿病相关的文本数据分析
JMIR Med Inform. 2016 Nov 24;4(4):e41. doi: 10.2196/medinform.5748.
6
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
7
Evaluating the coverage of controlled health data terminologies: report on the results of the NLM/AHCPR large scale vocabulary test.评估受控健康数据术语的覆盖范围:国立医学图书馆/卫生保健政策与研究局大规模词汇测试结果报告
J Am Med Inform Assoc. 1997 Nov-Dec;4(6):484-500. doi: 10.1136/jamia.1997.0040484.
8
Exploring and developing consumer health vocabularies.探索和开发消费者健康词汇表。
J Am Med Inform Assoc. 2006 Jan-Feb;13(1):24-9. doi: 10.1197/jamia.M1761. Epub 2005 Oct 12.
9
Mining consumer health vocabulary from community-generated text.从社区生成的文本中挖掘消费者健康词汇。
AMIA Annu Symp Proc. 2014 Nov 14;2014:1150-9. eCollection 2014.
10
Exploring medical expressions used by consumers and the media: an emerging view of consumer health vocabularies.探索消费者和媒体使用的医学表述:消费者健康词汇的新视角。
AMIA Annu Symp Proc. 2003;2003:674-8.

引用本文的文献

1
Enhancing chemical synthesis research with NLP: Word embeddings for chemical reagent identification-A case study on nano-FeCu.利用自然语言处理技术加强化学合成研究:用于化学试剂识别的词嵌入——以纳米铁铜为例
iScience. 2024 Aug 29;27(10):110780. doi: 10.1016/j.isci.2024.110780. eCollection 2024 Oct 18.
2
Joint coordinate attention mechanism and instance normalization for COVID online comments text classification.用于新冠疫情在线评论文本分类的联合坐标注意力机制与实例归一化
PeerJ Comput Sci. 2024 Aug 19;10:e2240. doi: 10.7717/peerj-cs.2240. eCollection 2024.
3
Examining the Type, Quality, and Content of Web-Based Information for People With Chronic Pain Interested in Spinal Cord Stimulation: Social Listening Study.

本文引用的文献

1
CIDO, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis.CIDO,一种基于社区的冠状病毒疾病知识和数据集成、共享和分析的本体。
Sci Data. 2020 Jun 12;7(1):181. doi: 10.1038/s41597-020-0523-6.
2
A Neuro-ontology for the neurological examination.神经检查的神经本体论。
BMC Med Inform Decis Mak. 2020 Mar 4;20(1):47. doi: 10.1186/s12911-020-1066-7.
3
Development of a Consumer Health Vocabulary by Mining Health Forum Texts Based on Word Embedding: Semiautomatic Approach.基于词嵌入挖掘健康论坛文本开发消费者健康词汇表:半自动方法
针对对脊髓刺激感兴趣的慢性疼痛患者,审视基于网络信息的类型、质量和内容:社会倾听研究。
J Med Internet Res. 2024 Jan 30;26:e48599. doi: 10.2196/48599.
4
Search Term Identification Methods for Computational Health Communication: Word Embedding and Network Approach for Health Content on YouTube.计算健康传播中的搜索词识别方法:YouTube上健康内容的词嵌入与网络方法
JMIR Med Inform. 2022 Aug 30;10(8):e37862. doi: 10.2196/37862.
JMIR Med Inform. 2019 May 23;7(2):e12704. doi: 10.2196/12704.
4
Medical concept normalization in social media posts with recurrent neural networks.社交媒体帖子中的医学概念规范化:基于递归神经网络的方法
J Biomed Inform. 2018 Aug;84:93-102. doi: 10.1016/j.jbi.2018.06.006. Epub 2018 Jun 12.
5
Semantic annotation of consumer health questions.消费者健康问题的语义标注。
BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1.
6
Medical Text Classification Using Convolutional Neural Networks.使用卷积神经网络的医学文本分类
Stud Health Technol Inform. 2017;235:246-250.
7
Enriching consumer health vocabulary through mining a social Q&A site: A similarity-based approach.通过挖掘一个社交问答网站丰富消费者健康词汇:一种基于相似度的方法。
J Biomed Inform. 2017 May;69:75-85. doi: 10.1016/j.jbi.2017.03.016. Epub 2017 Mar 27.
8
Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists' Bread and Butter.来自制药专利的大数据:药物化学家基本业务的计算分析
J Med Chem. 2016 May 12;59(9):4385-402. doi: 10.1021/acs.jmedchem.6b00153. Epub 2016 Apr 8.
9
Exploring the application of deep learning techniques on medical text corpora.探索深度学习技术在医学文本语料库上的应用。
Stud Health Technol Inform. 2014;205:584-8.
10
Dangers and opportunities for social media in medicine.社交媒体在医学领域中的风险与机遇。
Clin Obstet Gynecol. 2013 Sep;56(3):453-62. doi: 10.1097/GRF.0b013e318297dc38.