将异构文本源中的表型信息映射到特定领域的术语资源。

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.

作者信息

Alnazzawi Noha, Thompson Paul, Ananiadou Sophia

机构信息

National Centre for Text Mining, Manchester Institute of Biotechnology, Manchester University, Manchester, United Kingdom.

出版信息

PLoS One. 2016 Sep 19;11(9):e0162287. doi: 10.1371/journal.pone.0162287. eCollection 2016.

DOI:10.1371/journal.pone.0162287

PMID:27643689

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5028053/

Abstract

Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus-a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm's wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks.

摘要

生物医学文献文章和电子健康记录（EHR）中的叙述性内容都是疾病表型信息的丰富来源。表型概念可能会在文本中以多种方式被提及，使用具有各种结构的短语。这种变异性部分源于作者的不同背景，但也源于每种文本类型通常使用的不同写作风格。由于EHR叙述性报告和文献文章包含不同但互补的有价值信息类型，将每种文本类型的细节结合起来有助于发现新的疾病表型关联。然而，同一概念在每个来源中可能被提及的不同方式构成了信息自动整合的障碍。因此，识别文本中短语所代表的独特概念有助于弥合文本类型之间的差距。我们描述了一种新方法PhenoNorm的开发，该方法整合了多种不同的相似性度量，以便将表型概念提及与生物医学术语资源UMLS元词表中的已知概念自动链接。PhenoNorm是使用PhenoCHF语料库开发的，该语料库是EHR中的文献文章和叙述的集合，标注了与充血性心力衰竭（CHF）相关的表型信息。我们使用新扩充的PhenoCHF版本评估PhenoNorm在将CHF相关表型提及与元词表概念链接方面的性能，其中每个表型提及都有一个经专家验证的与UMLS元词表中概念的链接。我们表明PhenoNorm在应用于相同任务时优于许多其他替代方法。此外，我们通过评估其将涵盖更广泛主题领域的文本中出现的各种其他类型医学相关信息的提及与不同术语资源中的概念链接的能力，展示了PhenoNorm更广泛的实用性。我们表明PhenoNorm可以保持性能水平，并且其准确性与应用于这些任务的其他方法相比具有优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90c3/5028053/65dcac34e1f0/pone.0162287.g001.jpg

相似文献

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.将异构文本源中的表型信息映射到特定领域的术语资源。

PLoS One. 2016 Sep 19;11(9):e0162287. doi: 10.1371/journal.pone.0162287. eCollection 2016.

Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.使用文本挖掘技术从PhenoCHF语料库中提取表型信息。

BMC Med Inform Decis Mak. 2015;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1472-6947-15-S2-S3. Epub 2015 Jun 15.

Towards a semantic lexicon for clinical natural language processing.迈向用于临床自然语言处理的语义词典。

AMIA Annu Symp Proc. 2012;2012:568-76. Epub 2012 Nov 3.

NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库：一种用于疾病名称识别和概念规范化的资源。

J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.

Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets.医学概念规范化中的歧义：电子健康记录数据集的类型和覆盖范围分析。

J Am Med Inform Assoc. 2021 Mar 1;28(3):516-532. doi: 10.1093/jamia/ocaa269.

Assessment of disease named entity recognition on a corpus of annotated sentences.基于带注释句子语料库的疾病命名实体识别评估。

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.

Improving search over Electronic Health Records using UMLS-based query expansion through random walks.通过基于统一医学语言系统（UMLS）的随机游走查询扩展来改进对电子健康记录的检索。

J Biomed Inform. 2014 Oct;51:100-6. doi: 10.1016/j.jbi.2014.04.013. Epub 2014 Apr 21.

Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.生物医学文本到UMLS元词表的有效映射：MetaMap程序

Proc AMIA Symp. 2001:17-21.

Effective grading of termhood in biomedical literature.生物医学文献中足月状态的有效分级。

AMIA Annu Symp Proc. 2005;2005:809-13.

Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.开发和评估 RapTAT：一种用于从医学叙述中映射短语概念的机器学习系统。

J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.

引用本文的文献

A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.2007年至2022年英国临床自然语言处理调查。

NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.

Systematic review of current natural language processing methods and applications in cardiology.系统评价当前自然语言处理方法在心脏病学中的应用。

Heart. 2022 May 25;108(12):909-916. doi: 10.1136/heartjnl-2021-319769.

Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies.自然语言处理算法在将临床文本片段映射到本体概念上的应用：系统评价及对未来研究的建议。

J Biomed Semantics. 2020 Nov 16;11(1):14. doi: 10.1186/s13326-020-00231-z.

Annotating and detecting phenotypic information for chronic obstructive pulmonary disease.标注与检测慢性阻塞性肺疾病的表型信息。

JAMIA Open. 2019 Apr 26;2(2):261-271. doi: 10.1093/jamiaopen/ooz009. eCollection 2019 Jul.

Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.慢性病临床记录的自然语言处理：系统综述

JMIR Med Inform. 2019 Apr 27;7(2):e12239. doi: 10.2196/12239.

Annotation and detection of drug effects in text for pharmacovigilance.用于药物警戒的文本中药物效应的标注与检测。

J Cheminform. 2018 Aug 13;10(1):37. doi: 10.1186/s13321-018-0290-y.

本文引用的文献

A method for the development of disease-specific reference standards vocabularies from textual biomedical literature resources.一种从文本生物医学文献资源中开发疾病特异性参考标准词汇表的方法。

Artif Intell Med. 2016 Mar;68:47-57. doi: 10.1016/j.artmed.2016.02.003. Epub 2016 Feb 27.

NOBLE - Flexible concept recognition for large-scale biomedical natural language processing.NOBLE——用于大规模生物医学自然语言处理的灵活概念识别

BMC Bioinformatics. 2016 Jan 14;17:32. doi: 10.1186/s12859-015-0871-y.

Normalizing clinical terms using learned edit distance patterns.使用学习到的编辑距离模式对临床术语进行规范化。

J Am Med Inform Assoc. 2016 Mar;23(2):380-6. doi: 10.1093/jamia/ocv108. Epub 2015 Jul 31.

Extending the evaluation of Genia Event task toward knowledge base construction and comparison to Gene Regulation Ontology task.将Genia事件任务的评估扩展到知识库构建，并与基因调控本体任务进行比较。

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S3. doi: 10.1186/1471-2105-16-S10-S3. Epub 2015 Jul 13.

The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease.人类表型本体论：常见疾病与罕见疾病的语义统一

Am J Hum Genet. 2015 Jul 2;97(1):111-24. doi: 10.1016/j.ajhg.2015.05.020. Epub 2015 Jun 25.

Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.使用文本挖掘技术从PhenoCHF语料库中提取表型信息。

BMC Med Inform Decis Mak. 2015;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1472-6947-15-S2-S3. Epub 2015 Jun 15.

Concept selection for phenotypes and diseases using learn to rank.使用排序学习法进行表型和疾病的概念选择。

J Biomed Semantics. 2015 Jun 1;6:24. doi: 10.1186/s13326-015-0019-z. eCollection 2015.

Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.从与表型特别相关的生物医学文本中生成银标准概念注释。

PLoS One. 2015 Jan 21;10(1):e0116040. doi: 10.1371/journal.pone.0116040. eCollection 2015.

Evaluating the state of the art in disorder recognition and normalization of the clinical narrative.评估临床病历中疾病识别和规范化的当前技术水平。

J Am Med Inform Assoc. 2015 Jan;22(1):143-54. doi: 10.1136/amiajnl-2013-002544. Epub 2014 Aug 21.

Mapping biological entities using the longest approximately common prefix method.使用最长近似公共前缀方法对生物实体进行映射。

BMC Bioinformatics. 2014 Jun 14;15:187. doi: 10.1186/1471-2105-15-187.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

将异构文本源中的表型信息映射到特定领域的术语资源。

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献