Suppr超能文献

一种使用GloVe词嵌入和辅助词汇资源来丰富消费者健康词汇表的自动化方法。

An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource.

作者信息

Ibrahim Mohammed, Gauch Susan, Salman Omar, Alqahtani Mohammed

机构信息

Computer Science and Computer Engineering, University of Arkansas at Fayetteville, Fayetteville, AR, United States.

出版信息

PeerJ Comput Sci. 2021 Aug 9;7:e668. doi: 10.7717/peerj-cs.668. eCollection 2021.

Abstract

BACKGROUND

Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical terminology which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa.

OBJECTIVE

Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this paper, we present an automatic method to enrich laymen's vocabularies that has the benefit of being able to be applied to vocabularies in any domain.

METHODS

Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies. Our approach further improves the consumer health vocabularies by incorporating synonyms and hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms incorporating WordNet were evaluated using two laymen datasets from the National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC CHV) and MedlinePlus Healthcare Vocabulary.

RESULTS

The results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed basic GloVe with an average F-score of 61%, a relative improvement of 25%. Furthermore, the enhanced GloVe showed a statistical significance over the two ground truth datasets with < 0.001.

CONCLUSIONS

This paper presents an automatic approach to enrich consumer health vocabularies using the GloVe word embeddings and an auxiliary lexical source, WordNet. Our approach was evaluated used healthcare text downloaded from , a healthcare social media platform using two standard laymen vocabularies, OAC CHV, and MedlinePlus. We used the WordNet ontology to expand the healthcare corpus by including synonyms, hyponyms, and hypernyms for each layman term occurrence in the corpus. Given a seed term selected from a concept in the ontology, we measured our algorithms' ability to automatically extract synonyms for those terms that appeared in the ground truth concept. We found that enhanced GloVe outperformed GloVe with a relative improvement of 25% in the F-score.

摘要

背景

清晰的语言能使任意两方之间的交流更顺畅。由于外行人不理解某领域的专业术语,他们可能难以与专业人士进行沟通。在医疗保健领域,很少能找到熟悉医学术语的外行人,这可能导致他们对自身病情和/或治疗的理解不足。为了弥合这一差距,人们创建了一些专业词汇表和本体,用于将外行人的医学术语映射到专业医学术语,反之亦然。

目的

目前呈现的许多词汇表是通过人工或半自动方式构建的,这需要投入大量时间和人力,导致这些词汇表的增长缓慢。在本文中,我们提出一种自动方法来丰富外行人的词汇表,该方法的优点是能够应用于任何领域的词汇表。

方法

我们完全自动的方法在从社交媒体医疗平台收集的语料库上使用机器学习,特别是词向量全局向量(GloVe),来扩展和增强消费者健康词汇表。我们的方法通过纳入来自WordNet本体的同义词和下位词,进一步改进了消费者健康词汇表。使用来自美国国立医学图书馆(NLM)的两个外行人数据集、开放获取消费者健康词汇表(OAC CHV)和MedlinePlus医疗词汇表,对基本的GloVe和我们纳入WordNet的新算法进行了评估。

结果

结果表明,GloVe能够找到新的外行人术语,F值为48.44%。此外,我们改进后的GloVe方法表现优于基本的GloVe,平均F值为61%,相对提高了25%。此外,改进后的GloVe在两个基准数据集上具有统计学意义,P < 0.001。

结论

本文提出了一种使用GloVe词向量和辅助词汇源WordNet来丰富消费者健康词汇表的自动方法。我们的方法使用从一个医疗社交媒体平台下载的医疗文本,通过两个标准的外行人词汇表OAC CHV和MedlinePlus进行评估。我们使用WordNet本体,通过为语料库中出现的每个外行人术语纳入同义词、下位词和上位词,来扩展医疗语料库。给定从本体中的一个概念选择的种子术语,我们测量了我们的算法自动提取出现在基准概念中的那些术语的同义词的能力。我们发现,改进后的GloVe在F值上表现优于GloVe,相对提高了25%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c779/8371999/48f0cefd54b9/peerj-cs-07-668-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验