Suppr超能文献

一种丰富人类表型本体的新同义词替换方法。

A new synonym-substitution method to enrich the human phenotype ontology.

作者信息

Taboada Maria, Rodriguez Hadriana, Gudivada Ranga C, Martinez Diego

机构信息

Department of Electronics & Computer Science, University of Santiago de Compostela, Campus Vida, Santiago de Compostela, 15705, Spain.

CareCentrix, Hartford, 06103, Conneticut, USA.

出版信息

BMC Bioinformatics. 2017 Oct 10;18(1):446. doi: 10.1186/s12859-017-1858-7.

Abstract

BACKGROUND

Named entity recognition is critical for biomedical text mining, where it is not unusual to find entities labeled by a wide range of different terms. Nowadays, ontologies are one of the crucial enabling technologies in bioinformatics, providing resources for improved natural language processing tasks. However, biomedical ontology-based named entity recognition continues to be a major research problem.

RESULTS

This paper presents an automated synonym-substitution method to enrich the Human Phenotype Ontology (HPO) with new synonyms. The approach is mainly based on both the lexical properties of the terms and the hierarchical structure of the ontology. By scanning the lexical difference between a term and its descendant terms, the method can learn new names and modifiers in order to generate synonyms for the descendant terms. By searching for the exact phrases in MEDLINE, the method can automatically rule out illogical candidate synonyms. In total, 745 new terms were identified. These terms were indirectly evaluated through the concept annotations on a gold standard corpus and also by document retrieval on a collection of abstracts on hereditary diseases. A moderate improvement in the F-measure performance on the gold standard corpus was observed. Additionally, 6% more abstracts on hereditary diseases were retrieved, and this percentage was 33% higher if only the highly informative concepts were considered.

CONCLUSIONS

A synonym-substitution procedure that leverages the HPO hierarchical structure works well for a reliable and automatic extension of the terminology. The results show that the generated synonyms have a positive impact on concept recognition, mainly those synonyms corresponding to highly informative HPO terms.

摘要

背景

命名实体识别对于生物医学文本挖掘至关重要,在该领域中发现由各种不同术语标记的实体并不罕见。如今,本体是生物信息学中的关键使能技术之一,为改进自然语言处理任务提供资源。然而,基于生物医学本体的命名实体识别仍然是一个主要的研究问题。

结果

本文提出了一种自动同义词替换方法,以用新同义词丰富人类表型本体(HPO)。该方法主要基于术语的词汇属性和本体的层次结构。通过扫描一个术语与其后代术语之间的词汇差异,该方法可以学习新的名称和修饰词,以便为后代术语生成同义词。通过在MEDLINE中搜索确切短语,该方法可以自动排除不合逻辑的候选同义词。总共识别出745个新术语。这些术语通过在金标准语料库上的概念注释以及通过对一组遗传性疾病摘要的文档检索进行间接评估。在金标准语料库上观察到F值性能有适度提高。此外,检索到的遗传性疾病摘要多了6%,如果只考虑高信息性概念,这个百分比会高出33%。

结论

利用HPO层次结构的同义词替换程序对于术语的可靠和自动扩展效果良好。结果表明,生成的同义词对概念识别有积极影响,主要是那些与高信息性HPO术语对应的同义词。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3974/5635572/3600720c41b0/12859_2017_1858_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验