Suppr超能文献

使用一千八百万条MEDLINE引文对五百万条统一医学语言系统(UMLS)元词表术语进行的综合分析。

A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations.

作者信息

Xu Rong, Musen Mark A, Shah Nigam H

机构信息

Center for Biomedical Informatics Research, Stanford University School of Medicine Stanford, CA 94305, USA.

出版信息

AMIA Annu Symp Proc. 2010 Nov 13;2010:907-11.

Abstract

The Unified Medical Language System (UMLS) Metathesaurus is widely used for biomedical natural language processing (NLP) tasks. In this study, we systematically analyzed UMLS Metathesaurus terms by analyzing their occurrences in over 18 million MEDLINE abstracts. Our goals were: 1. analyze the frequency and syntactic distribution of Metathesaurus terms in MEDLINE; 2. create a filtered UMLS Metathesaurus based on the MEDLINE analysis; 3. augment the UMLS Metathesaurus where each term is associated with metadata on its MEDLINE frequency and syntactic distribution statistics. After MEDLINE frequency-based filtering, the augmented UMLS Metathesaurus contains 518,835 terms and is roughly 13% of its original size. We have shown that the syntactic and frequency information is useful to identify errors in the Metathesaurus. This filtered and augmented UMLS Metathesaurus can potentially be used to improve efficiency and precision of UMLS-based information retrieval and NLP tasks.

摘要

统一医学语言系统(UMLS)元词表被广泛用于生物医学自然语言处理(NLP)任务。在本研究中,我们通过分析元词表术语在超过1800万篇MEDLINE摘要中的出现情况,对UMLS元词表术语进行了系统分析。我们的目标是:1. 分析元词表术语在MEDLINE中的频率和句法分布;2. 基于MEDLINE分析创建一个经过筛选的UMLS元词表;3. 扩充UMLS元词表,使每个术语都与关于其MEDLINE频率和句法分布统计的元数据相关联。经过基于MEDLINE频率的筛选后,扩充后的UMLS元词表包含518,835个术语,约为其原始大小的13%。我们已经表明,句法和频率信息有助于识别元词表中的错误。这个经过筛选和扩充的UMLS元词表有可能用于提高基于UMLS的信息检索和NLP任务的效率和精度。

相似文献

引用本文的文献

1
CoMNRank: An integrated approach to extract and prioritize human microbial metabolites from MEDLINE records.
J Biomed Inform. 2020 Sep;109:103524. doi: 10.1016/j.jbi.2020.103524. Epub 2020 Aug 11.
2
Neophilia Ranking of Scientific Journals.
Scientometrics. 2017 Jan;110(1):43-64. doi: 10.1007/s11192-016-2157-1. Epub 2016 Oct 22.
3
Building the graph of medicine from millions of clinical narratives.
Sci Data. 2014 Sep 16;1:140032. doi: 10.1038/sdata.2014.32. eCollection 2014.
4
Identifying named entities from PubMed for enriching semantic categories.
BMC Bioinformatics. 2015 Feb 21;16:57. doi: 10.1186/s12859-015-0487-2.
5
SimQ: real-time retrieval of similar consumer health questions.
J Med Internet Res. 2015 Feb 17;17(2):e43. doi: 10.2196/jmir.3388.
6
Functional evaluation of out-of-the-box text-mining tools for data-mining tasks.
J Am Med Inform Assoc. 2015 Jan;22(1):121-31. doi: 10.1136/amiajnl-2014-002902. Epub 2014 Oct 21.
7
Molecularly and clinically related drugs and diseases are enriched in phenotypically similar drug-disease pairs.
Genome Med. 2014 Aug 17;6(7):52. doi: 10.1186/s13073-014-0052-z. eCollection 2014.
8
Quantifying the impact and extent of undocumented biomedical synonymy.
PLoS Comput Biol. 2014 Sep 25;10(9):e1003799. doi: 10.1371/journal.pcbi.1003799. eCollection 2014 Sep.
9
Text mining for adverse drug events: the promise, challenges, and state of the art.
Drug Saf. 2014 Oct;37(10):777-90. doi: 10.1007/s40264-014-0218-z.
10
Toward personalizing treatment for depression: predicting diagnosis and severity.
J Am Med Inform Assoc. 2014 Nov-Dec;21(6):1069-75. doi: 10.1136/amiajnl-2014-002733. Epub 2014 Jul 2.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验