Suppr超能文献

将异构文本源中的表型信息映射到特定领域的术语资源。

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.

作者信息

Alnazzawi Noha, Thompson Paul, Ananiadou Sophia

机构信息

National Centre for Text Mining, Manchester Institute of Biotechnology, Manchester University, Manchester, United Kingdom.

出版信息

PLoS One. 2016 Sep 19;11(9):e0162287. doi: 10.1371/journal.pone.0162287. eCollection 2016.

Abstract

Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus-a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm's wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks.

摘要

生物医学文献文章和电子健康记录(EHR)中的叙述性内容都是疾病表型信息的丰富来源。表型概念可能会在文本中以多种方式被提及,使用具有各种结构的短语。这种变异性部分源于作者的不同背景,但也源于每种文本类型通常使用的不同写作风格。由于EHR叙述性报告和文献文章包含不同但互补的有价值信息类型,将每种文本类型的细节结合起来有助于发现新的疾病表型关联。然而,同一概念在每个来源中可能被提及的不同方式构成了信息自动整合的障碍。因此,识别文本中短语所代表的独特概念有助于弥合文本类型之间的差距。我们描述了一种新方法PhenoNorm的开发,该方法整合了多种不同的相似性度量,以便将表型概念提及与生物医学术语资源UMLS元词表中的已知概念自动链接。PhenoNorm是使用PhenoCHF语料库开发的,该语料库是EHR中的文献文章和叙述的集合,标注了与充血性心力衰竭(CHF)相关的表型信息。我们使用新扩充的PhenoCHF版本评估PhenoNorm在将CHF相关表型提及与元词表概念链接方面的性能,其中每个表型提及都有一个经专家验证的与UMLS元词表中概念的链接。我们表明PhenoNorm在应用于相同任务时优于许多其他替代方法。此外,我们通过评估其将涵盖更广泛主题领域的文本中出现的各种其他类型医学相关信息的提及与不同术语资源中的概念链接的能力,展示了PhenoNorm更广泛的实用性。我们表明PhenoNorm可以保持性能水平,并且其准确性与应用于这些任务的其他方法相比具有优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90c3/5028053/65dcac34e1f0/pone.0162287.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验