Suppr超能文献

通过基于本体的词汇扩展来改善临床文本的特征描述。

Improved characterisation of clinical text through ontology-based vocabulary expansion.

机构信息

Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK.

University Hospitals Birmingham NHS Foundation Trust, University of Birmingham, Birmingham, B15 2TT, UK.

出版信息

J Biomed Semantics. 2021 Apr 12;12(1):7. doi: 10.1186/s13326-021-00241-5.

Abstract

BACKGROUND

Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contexts across several ontologies. While these concepts describe the same entities, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks.

RESULTS

We develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set.

CONCLUSIONS

Inter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert.

摘要

背景

生物医学本体包含丰富的元数据,这些元数据构成了文本挖掘的基础架构资源。由于多种原因,本体生态系统中存在冗余,导致在同一或相似的上下文中,同一实体被同一或相似的多个概念描述。虽然这些概念描述了相同的实体,但它们包含不同的补充元数据集。将这些定义链接起来,利用它们的组合元数据,可以提高基于本体的信息检索、提取和分析任务的性能。

结果

我们开发并提出了一种算法,该算法使用严格的词汇匹配和跨本体推理器支持的等价查询相结合,扩展与本体类相关联的标签集。在疾病本体论中的所有疾病术语中,该方法找到了 51362 个额外的标签,比本体本身定义的标签数量增加了两倍多。在人类表型本体论上对扩展同义词的随机抽样进行临床专家的手动验证,得到了 0.912 的精度。此外,我们发现,用扩展的疾病本体论标签集注释 MIMIC-III 中的患者就诊记录,基于这些标签的语义相似性得分是匹配第一诊断的更好预测指标,未扩展的标签集的平均精度为 0.88,扩展的标签集的平均精度为 0.913。

结论

跨本体同义词扩展可以大大增加文本挖掘应用程序可用的词汇量规模。虽然扩展词汇的准确性并不完美,但它仍然导致在一个环境中从文本中对患者进行基于本体的特征描述有了显著的改进。此外,在不允许出现运行时错误的情况下,可以使用该技术来提供候选同义词,这些同义词可以由领域专家检查。

相似文献

5
Semantic Search for Large Scale Clinical Ontologies.大规模临床本体的语义搜索。
AMIA Annu Symp Proc. 2022 Feb 21;2021:910-919. eCollection 2021.

本文引用的文献

7
The Human Phenotype Ontology in 2017.2017年的人类表型本体论。
Nucleic Acids Res. 2017 Jan 4;45(D1):D865-D876. doi: 10.1093/nar/gkw1039. Epub 2016 Nov 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验