School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
Int J Med Inform. 2022 Nov;167:104870. doi: 10.1016/j.ijmedinf.2022.104870. Epub 2022 Sep 17.
We assess the potential of exploiting stopwords in biomedical concept names to complete the logical definitions of concepts that are not sufficiently defined.
Concepts containing stopwords are selected from the Disorder hierarchy of Systematized NOmenclature of MEDicine (SNOMED-CT). SNOMED-CT consists of two types of concepts: Fully Defined (FD) concepts which are sufficiently defined and Partially Defined (PD) concepts which are not sufficiently defined. In this work, FD concepts containing stopwords are treated as a source of ground truth to complete the definitions of, lexically and semantically similar, PD concepts. FD and PD concepts are lexically and semantically analysed to create sample-sets. Mandatory attribute-relationships are calculated by using an intersection-set logic for each FD sample-set. PD sample-sets are audited against this mandatory attribute-relationship template to identify inconsistencies in modelling styles and potentially missing attribute-relationships.
Lexical and semantic patterns around 11 stopwords were analysed. 26 sample-sets were extracted for the 11 stopwords. Mandatory attribute-relationships were identified for 24 of the 26 sample-sets. The method identified 62.5% - 72.22% of the PD concepts, containing the stopwords in and due to, to be inconsistent in their modelling style and potentially missing at least one attribute-relationship according to the created template.
评估利用生物医学概念名称中的停用词来完善逻辑定义的潜力,这些概念定义不充分。
从医学系统命名法(SNOMED-CT)的疾病层次结构中选择包含停用词的概念。SNOMED-CT 由两种类型的概念组成:充分定义的(FD)概念和未充分定义的(PD)概念。在这项工作中,包含停用词的 FD 概念被视为完善具有词汇和语义相似性的 PD 概念定义的真实来源。FD 和 PD 概念进行词汇和语义分析,以创建样本集。对于每个 FD 样本集,使用交集集逻辑计算强制性属性关系。PD 样本集针对此强制性属性关系模板进行审核,以识别建模风格中的不一致和潜在缺失的属性关系。
分析了大约 11 个停用词的词汇和语义模式。为 11 个停用词提取了 26 个样本集。26 个样本集中有 24 个确定了强制性属性关系。该方法根据创建的模板,确定了包含停用词 in 和 due to 的 62.5%至 72.22%的 PD 概念在其建模风格上不一致,并且可能至少缺少一个属性关系。