通过基于本体的词汇扩展来改善临床文本的特征描述。

Improved characterisation of clinical text through ontology-based vocabulary expansion.

机构信息

Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK.

University Hospitals Birmingham NHS Foundation Trust, University of Birmingham, Birmingham, B15 2TT, UK.

出版信息

J Biomed Semantics. 2021 Apr 12;12(1):7. doi: 10.1186/s13326-021-00241-5.

DOI:10.1186/s13326-021-00241-5

PMID:33845909

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8042947/

Abstract

BACKGROUND

Biomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same entities being described by several concepts in the same or similar contexts across several ontologies. While these concepts describe the same entities, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks.

RESULTS

We develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set.

CONCLUSIONS

Inter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert.

摘要

背景

生物医学本体包含丰富的元数据，这些元数据构成了文本挖掘的基础架构资源。由于多种原因，本体生态系统中存在冗余，导致在同一或相似的上下文中，同一实体被同一或相似的多个概念描述。虽然这些概念描述了相同的实体，但它们包含不同的补充元数据集。将这些定义链接起来，利用它们的组合元数据，可以提高基于本体的信息检索、提取和分析任务的性能。

结果

我们开发并提出了一种算法，该算法使用严格的词汇匹配和跨本体推理器支持的等价查询相结合，扩展与本体类相关联的标签集。在疾病本体论中的所有疾病术语中，该方法找到了 51362 个额外的标签，比本体本身定义的标签数量增加了两倍多。在人类表型本体论上对扩展同义词的随机抽样进行临床专家的手动验证，得到了 0.912 的精度。此外，我们发现，用扩展的疾病本体论标签集注释 MIMIC-III 中的患者就诊记录，基于这些标签的语义相似性得分是匹配第一诊断的更好预测指标，未扩展的标签集的平均精度为 0.88，扩展的标签集的平均精度为 0.913。

结论

跨本体同义词扩展可以大大增加文本挖掘应用程序可用的词汇量规模。虽然扩展词汇的准确性并不完美，但它仍然导致在一个环境中从文本中对患者进行基于本体的特征描述有了显著的改进。此外，在不允许出现运行时错误的情况下，可以使用该技术来提供候选同义词，这些同义词可以由领域专家检查。

相似文献

Improved characterisation of clinical text through ontology-based vocabulary expansion.通过基于本体的词汇扩展来改善临床文本的特征描述。

J Biomed Semantics. 2021 Apr 12;12(1):7. doi: 10.1186/s13326-021-00241-5.

Matching biomedical ontologies based on formal concept analysis.基于形式概念分析的生物医学本体匹配

J Biomed Semantics. 2018 Mar 19;9(1):11. doi: 10.1186/s13326-018-0178-9.

Aggregating the syntactic and semantic similarity of healthcare data towards their transformation to HL7 FHIR through ontology matching.通过本体匹配，聚合医疗保健数据的语法和语义相似性，以将其转换为 HL7 FHIR。

Int J Med Inform. 2019 Dec;132:104002. doi: 10.1016/j.ijmedinf.2019.104002. Epub 2019 Oct 5.

PPR-SSM: personalized PageRank and semantic similarity measures for entity linking.PPR-SSM：用于实体链接的个性化 PageRank 和语义相似性度量。

BMC Bioinformatics. 2019 Oct 29;20(1):534. doi: 10.1186/s12859-019-3157-y.

Semantic Search for Large Scale Clinical Ontologies.大规模临床本体的语义搜索。

AMIA Annu Symp Proc. 2022 Feb 21;2021:910-919. eCollection 2021.

SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes.SIFR 标注器：基于本体论的法语生物医学文本和临床笔记的语义标注。

BMC Bioinformatics. 2018 Nov 6;19(1):405. doi: 10.1186/s12859-018-2429-2.

A new synonym-substitution method to enrich the human phenotype ontology.一种丰富人类表型本体的新同义词替换方法。

BMC Bioinformatics. 2017 Oct 10;18(1):446. doi: 10.1186/s12859-017-1858-7.

UFO: A tool for unifying biomedical ontology-based semantic similarity calculation, enrichment analysis and visualization.UFO：一种用于统一基于生物医学本体的语义相似性计算、富集分析和可视化的工具。

PLoS One. 2020 Jul 9;15(7):e0235670. doi: 10.1371/journal.pone.0235670. eCollection 2020.

Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy.利用本体和元数据进行生物医学词义消歧：自动化与准确性的结合。

BMC Bioinformatics. 2009 Jan 21;10:28. doi: 10.1186/1471-2105-10-28.

How to link ontologies and protein-protein interactions to literature: text-mining approaches and the BioCreative experience.如何将本体和蛋白质-蛋白质相互作用与文献联系起来：文本挖掘方法和 BioCreative 的经验。

Database (Oxford). 2012 Mar 21;2012:bas017. doi: 10.1093/database/bas017. Print 2012.

引用本文的文献

Toward clearer recognition and easier usefulness: development of a cross-lingual atherosclerotic cerebrovascular disease ontology.迈向更清晰的认知与更便捷的应用：跨语言动脉粥样硬化性脑血管疾病本体的开发

Database (Oxford). 2024 Dec 5;2024. doi: 10.1093/database/baae117.

Talking about diseases; developing a model of patient and public-prioritised disease phenotypes.谈及疾病；构建一个以患者和公众为优先的疾病表型模型。

NPJ Digit Med. 2024 Sep 30;7(1):263. doi: 10.1038/s41746-024-01257-8.

Multi-faceted semantic clustering with text-derived phenotypes.基于文本衍生表型的多方面语义聚类。

Comput Biol Med. 2021 Nov;138:104904. doi: 10.1016/j.compbiomed.2021.104904. Epub 2021 Sep 27.

Mantis: flexible and consensus-driven genome annotation.螳螂：灵活且基于共识的基因组注释。

Gigascience. 2021 Jun 2;10(6). doi: 10.1093/gigascience/giab042.

Comput Biol Med. 2021 Jun;133:104360. doi: 10.1016/j.compbiomed.2021.104360. Epub 2021 Apr 1.

本文引用的文献

A fast, accurate, and generalisable heuristic-based negation detection algorithm for clinical text.一种用于临床文本的快速、准确且可推广的基于启发式的否定检测算法。

Comput Biol Med. 2021 Mar;130:104216. doi: 10.1016/j.compbiomed.2021.104216. Epub 2021 Jan 16.

The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.2019 年君主计划：一个整合的数据和分析平台，连接不同物种的表型与基因型。

Nucleic Acids Res. 2020 Jan 8;48(D1):D704-D715. doi: 10.1093/nar/gkz997.

Cancer Care Treatment Outcome Ontology: A Novel Computable Ontology for Profiling Treatment Outcomes in Patients With Solid Tumors.癌症护理治疗结果本体论：一种用于描述实体瘤患者治疗结果的新型可计算本体论。

JCO Clin Cancer Inform. 2018 Dec;2:1-14. doi: 10.1200/CCI.18.00026.

Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources.人类表型本体（HPO）知识库和资源的扩展。

Nucleic Acids Res. 2019 Jan 8;47(D1):D1018-D1027. doi: 10.1093/nar/gky1105.

Identifying Human Phenotype Terms by Combining Machine Learning and Validation Rules.通过结合机器学习和验证规则来识别人类表型术语。

Biomed Res Int. 2017;2017:8565739. doi: 10.1155/2017/8565739. Epub 2017 Nov 9.

A new synonym-substitution method to enrich the human phenotype ontology.一种丰富人类表型本体的新同义词替换方法。

BMC Bioinformatics. 2017 Oct 10;18(1):446. doi: 10.1186/s12859-017-1858-7.

The Human Phenotype Ontology in 2017.2017年的人类表型本体论。

Nucleic Acids Res. 2017 Jan 4;45(D1):D865-D876. doi: 10.1093/nar/gkw1039. Epub 2016 Nov 28.

Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.基因本体同义词生成规则可提高生物医学概念识别的性能。

J Biomed Semantics. 2016 Sep 9;7(1):52. doi: 10.1186/s13326-016-0096-7.

MIMIC-III, a freely accessible critical care database.MIMIC-III，一个免费获取的重症监护数据库。

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

The role of ontologies in biological and biomedical research: a functional perspective.本体论在生物学和生物医学研究中的作用：功能视角

Brief Bioinform. 2015 Nov;16(6):1069-80. doi: 10.1093/bib/bbv011. Epub 2015 Apr 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。