迈向开发语义关联参照标准的框架。

Towards a framework for developing semantic relatedness reference standards.

机构信息

College of Pharmacy, University of Minnesota, Twin Cities, Minneapolis, MN 55455, USA.

出版信息

J Biomed Inform. 2011 Apr;44(2):251-65. doi: 10.1016/j.jbi.2010.10.004. Epub 2010 Oct 31.

DOI:10.1016/j.jbi.2010.10.004

PMID:21044697

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3063326/

Abstract

Our objective is to develop a framework for creating reference standards for functional testing of computerized measures of semantic relatedness. Currently, research on computerized approaches to semantic relatedness between biomedical concepts relies on reference standards created for specific purposes using a variety of methods for their analysis. In most cases, these reference standards are not publicly available and the published information provided in manuscripts that evaluate computerized semantic relatedness measurement approaches is not sufficient to reproduce the results. Our proposed framework is based on the experiences of medical informatics and computational linguistics communities and addresses practical and theoretical issues with creating reference standards for semantic relatedness. We demonstrate the use of the framework on a pilot set of 101 medical term pairs rated for semantic relatedness by 13 medical coding experts. While the reliability of this particular reference standard is in the "moderate" range; we show that using clustering and factor analyses offers a data-driven approach to finding systematic differences among raters and identifying groups of potential outliers. We test two ontology-based measures of relatedness and provide both the reference standard containing individual ratings and the R program used to analyze the ratings as open-source. Currently, these resources are intended to be used to reproduce and compare results of studies involving computerized measures of semantic relatedness. Our framework may be extended to the development of reference standards in other research areas in medical informatics including automatic classification, information retrieval from medical records and vocabulary/ontology development.

摘要

我们的目标是为计算机化语义关联功能测试创建参考标准制定框架。目前，关于计算机化生物医学概念之间语义关联的研究依赖于为特定目的创建的参考标准，这些标准采用了各种方法进行分析。在大多数情况下，这些参考标准并未公开，并且评估计算机化语义关联测量方法的手稿中提供的已发表信息不足以重现结果。我们提出的框架基于医学信息学和计算语言学社区的经验，并解决了为语义关联创建参考标准的实际和理论问题。我们在一个由 13 名医学编码专家对 101 对医学术语对进行语义关联评分的试点集中展示了该框架的使用。虽然该特定参考标准的可靠性处于“中等”范围；但我们表明，使用聚类和因子分析可以提供一种数据驱动的方法来发现评分者之间的系统差异，并识别潜在异常值的群体。我们测试了两种基于本体的关联度量方法，并提供了包含个人评分的参考标准以及用于分析评分的 R 程序作为开源资源。目前，这些资源旨在用于重现和比较涉及计算机化语义关联测量的研究结果。我们的框架可以扩展到医学信息学中其他研究领域的参考标准制定，包括自动分类、从医疗记录中检索信息以及词汇/本体开发。

相似文献

Towards a framework for developing semantic relatedness reference standards.迈向开发语义关联参照标准的框架。

J Biomed Inform. 2011 Apr;44(2):251-65. doi: 10.1016/j.jbi.2010.10.004. Epub 2010 Oct 31.

Measures of semantic similarity and relatedness in the biomedical domain.生物医学领域中语义相似性和相关性的度量。

J Biomed Inform. 2007 Jun;40(3):288-99. doi: 10.1016/j.jbi.2006.06.004. Epub 2006 Jun 10.

Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。

Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.

AMIA Annu Symp Proc. 2010 Nov 13;2010:572-6.

Anatomy of data integration.数据集成剖析。

J Biomed Inform. 2007 Jun;40(3):252-69. doi: 10.1016/j.jbi.2006.09.001. Epub 2006 Sep 24.

Evaluating semantic similarity and relatedness over the semantic grouping of clinical term pairs.基于临床术语对的语义分组评估语义相似性和相关性。

J Biomed Inform. 2015 Apr;54:329-36. doi: 10.1016/j.jbi.2014.11.014. Epub 2014 Dec 15.

Using Semantic Web technologies for the generation of domain-specific templates to support clinical study metadata standards.使用语义网技术生成特定领域模板以支持临床研究元数据标准。

J Biomed Semantics. 2016 Mar 3;7:10. doi: 10.1186/s13326-016-0053-5. eCollection 2016.

Ontology-based framework for electronic health records interoperability.基于本体的电子健康记录互操作性框架。

Stud Health Technol Inform. 2011;169:694-8.

A model-driven approach for representing clinical archetypes for Semantic Web environments.一种用于在语义网环境中表示临床原型的模型驱动方法。

J Biomed Inform. 2009 Feb;42(1):150-64. doi: 10.1016/j.jbi.2008.05.005. Epub 2008 May 23.

Semantic standards for the representation of medical records.医疗记录表示的语义标准。

Med Decis Making. 1991 Oct-Dec;11(4 Suppl):S76-80.

引用本文的文献

BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights.BioLORD-2023：融合大型语言模型和临床知识图谱洞察的语义文本表示。

J Am Med Inform Assoc. 2024 Sep 1;31(9):1844-1855. doi: 10.1093/jamia/ocae029.

Quality of word and concept embeddings in targetted biomedical domains.靶向生物医学领域中词和概念嵌入的质量。

Heliyon. 2023 Jun 2;9(6):e16818. doi: 10.1016/j.heliyon.2023.e16818. eCollection 2023 Jun.

Medical terminology-based computing system: a lightweight post-processing solution for out-of-vocabulary multi-word terms.基于医学术语的计算系统：针对词汇表外多词术语的轻量级后处理解决方案。

Front Mol Biosci. 2022 Aug 12;9:928530. doi: 10.3389/fmolb.2022.928530. eCollection 2022.

Improving medical term embeddings using UMLS Metathesaurus.利用 UMLS 语义学术语表改进医学术语嵌入。

BMC Med Inform Decis Mak. 2022 Apr 29;22(1):114. doi: 10.1186/s12911-022-01850-5.

Refining electronic medical records representation in manifold subspace.在流形子空间中细化电子病历表示。

BMC Bioinformatics. 2022 Apr 1;23(1):115. doi: 10.1186/s12859-022-04653-7.

Visualization of medical concepts represented using word embeddings: a scoping review.基于词向量表示的医学概念可视化：范围综述。

BMC Med Inform Decis Mak. 2022 Mar 29;22(1):83. doi: 10.1186/s12911-022-01822-9.

A Word Pair Dataset for Semantic Similarity and Relatedness in Korean Medical Vocabulary: Reference Development and Validation.一个用于韩语医学词汇语义相似性和相关性的词对数据集：参考开发与验证

JMIR Med Inform. 2021 Jun 24;9(6):e29667. doi: 10.2196/29667.

Comparing general and specialized word embeddings for biomedical named entity recognition.比较用于生物医学命名实体识别的通用词嵌入和专用词嵌入。

PeerJ Comput Sci. 2021 Feb 18;7:e384. doi: 10.7717/peerj-cs.384. eCollection 2021.

Lexicon Development for COVID-19-related Concepts Using Open-source Word Embedding Sources: An Intrinsic and Extrinsic Evaluation.利用开源词嵌入源开发COVID-19相关概念的词汇表：内在和外在评估

JMIR Med Inform. 2021 Feb 22;9(2):e21679. doi: 10.2196/21679.

Using word embeddings to improve the privacy of clinical notes.利用词嵌入技术提高临床笔记的隐私性。

J Am Med Inform Assoc. 2020 Jun 1;27(6):901-907. doi: 10.1093/jamia/ocaa038.

本文引用的文献

AMIA Annu Symp Proc. 2010 Nov 13;2010:572-6.

UMLS-Interface and UMLS-Similarity : open source software for measuring paths and semantic similarity.统一医学语言系统接口与统一医学语言系统相似度：用于测量路径和语义相似度的开源软件。

AMIA Annu Symp Proc. 2009 Nov 14;2009:431-5.

Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity.通过整合 MeSH 语义相似度来增强 MEDLINE 文档聚类。

Bioinformatics. 2009 Aug 1;25(15):1944-51. doi: 10.1093/bioinformatics/btp338. Epub 2009 Jun 3.

Influence of the MedDRA hierarchy on pharmacovigilance data mining results.MedDRA 层次结构对药物警戒数据挖掘结果的影响。

Int J Med Inform. 2009 Dec;78(12):e97-e103. doi: 10.1016/j.ijmedinf.2009.01.001. Epub 2009 Feb 18.

Predicting judged similarity of natural categories from their neural representations.从自然类别的神经表征预测其被判断的相似性。

Neuropsychologia. 2009 Feb;47(3):859-68. doi: 10.1016/j.neuropsychologia.2008.12.029. Epub 2008 Dec 31.

Comparison of ontology-based semantic-similarity measures.基于本体的语义相似性度量比较。

AMIA Annu Symp Proc. 2008 Nov 6;2008:384-8.

Intraclass correlations: uses in assessing rater reliability.组内相关系数：在评估评分者可靠性中的应用。

Psychol Bull. 1979 Mar;86(2):420-8. doi: 10.1037//0033-2909.86.2.420.

Predicting human brain activity associated with the meanings of nouns.预测与名词含义相关的人类大脑活动。

Science. 2008 May 30;320(5880):1191-5. doi: 10.1126/science.1152876.

Use abstracted patient-specific features to assist an information-theoretic measurement to assess similarity between medical cases.使用提取的患者特定特征来辅助进行信息论测量，以评估医疗病例之间的相似性。

J Biomed Inform. 2008 Dec;41(6):882-8. doi: 10.1016/j.jbi.2008.03.006. Epub 2008 Mar 22.

A cluster-based approach for semantic similarity in the biomedical domain.一种基于聚类的生物医学领域语义相似性方法。

Conf Proc IEEE Eng Med Biol Soc. 2006;2006:2713-7. doi: 10.1109/IEMBS.2006.259235.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验