Suppr超能文献

基于语料库的方法:从统一医学语言系统(UMLS)创建用于临床研究资格标准的语义词典。

Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS.

作者信息

Luo Zhihui, Duffy Robert, Johnson Stephen, Weng Chunhua

机构信息

Department of Biomedical Informatics, Columbia University.

出版信息

Summit Transl Bioinform. 2010 Mar 1;2010:26-30.

Abstract

We describe a corpus-based approach to creating a semantic lexicon using UMLS knowledge sources. We extracted 10,000 sentences from the eligibility criteria sections of clinical trial summaries contained in ClinicalTrials.gov. The UMLS Metathesaurus and SPECIALIST Lexical Tools were used to extract and normalize UMLS recognizable terms. When annotated with Semantic Network types, the corpus had a lexical ambiguity of 1.57 (=total types for unique lexemes / total unique lexemes) and a word occurrence ambiguity of 1.96 (=total type occurrences / total word occurrences). A set of semantic preference rules was developed and applied to completely eliminate ambiguity in semantic type assignment. The lexicon covered 95.95% UMLS-recognizable terms in our corpus. A total of 20 UMLS semantic types, representing about 17% of all the distinct semantic types assigned to corpus lexemes, covered about 80% of the vocabulary of our corpus.

摘要

我们描述了一种基于语料库的方法,利用美国国立医学图书馆统一医学语言系统(UMLS)知识源创建语义词汇表。我们从ClinicalTrials.gov中包含的临床试验摘要的纳入标准部分提取了10000个句子。使用UMLS元词表和专业词汇工具来提取和规范化UMLS可识别的术语。当用语义网络类型进行标注时,语料库的词汇歧义率为1.57(=唯一词元的总类型数/总唯一词元数),词出现歧义率为1.96(=总类型出现次数/总单词出现次数)。我们开发并应用了一组语义偏好规则,以完全消除语义类型分配中的歧义。该词汇表涵盖了我们语料库中95.95%的UMLS可识别术语。总共20种UMLS语义类型,约占分配给语料库词元的所有不同语义类型的17%,涵盖了我们语料库约80%的词汇。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c64/3041551/cace208275ed/amia-s2010_cri_026f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验