School of Computer Science, University of South China, Hengyang, Hunan 421001, China.
School of Computer Science, University of South China, Hengyang, Hunan 421001, China.
J Biomed Inform. 2017 Nov;75:129-137. doi: 10.1016/j.jbi.2017.10.001. Epub 2017 Oct 4.
Organizing the descendants of a concept under a particular semantic relationship may be rather arbitrarily carried out during the manual creation processes of large biomedical terminologies, resulting in imbalances in relationship granularity. This work aims to propose scalable models towards systematically evaluating the granularity balance of semantic relationships. We first utilize "parallel concepts set (PCS)" and two features (the length and the strength) of the paths between PCSs to design the general evaluation models, based on which we propose eight concrete evaluation models generated by two specific types of PCSs: single concept set and symmetric concepts set. We then apply those concrete models to the IS-A relationship in FMA and SNOMED CT's Body Structure subset, as well as to the Part-Of relationship in FMA. Moreover, without loss of generality, we conduct two additional rounds of applications on the Part-Of relationship after removing length redundancies and strength redundancies sequentially. At last, we perform automatic evaluation on the imbalances detected after the final round for identifying missing concepts, misaligned relations and inconsistencies. For the IS-A relationship, 34 missing concepts, 80 misalignments and 18 redundancies in FMA as well as 28 missing concepts, 114 misalignments and 1 redundancy in SNOMED CT were uncovered. In addition, 6,801 instances of imbalances for the Part-Of relationship in FMA were also identified, including 3,246 redundancies. After removing those redundancies from FMA, the total number of Part-Of imbalances was dramatically reduced to 327, including 51 missing concepts, 294 misaligned relations, and 36 inconsistencies. Manual curation performed by the FMA project leader confirmed the effectiveness of our method in identifying curation errors. In conclusion, the granularity balance of hierarchical semantic relationship is a valuable property to check for ontology quality assurance, and the scalable evaluation models proposed in this study are effective in fulfilling this task, especially in auditing relationships with sub-hierarchies, such as the seldom evaluated Part-Of relationship.
对概念的后代进行组织可能在大型生物医学术语的手动创建过程中相当随意地进行,这导致语义关系的粒度不平衡。本工作旨在提出可扩展的模型,以系统地评估语义关系的粒度平衡。我们首先利用“平行概念集(PCS)”和 PCS 之间路径的两个特征(长度和强度)来设计通用评估模型,在此基础上,我们提出了通过两种特定类型的 PCS 生成的八个具体评估模型:单个概念集和对称概念集。然后,我们将这些具体模型应用于 FMA 的 IS-A 关系和 SNOMED CT 的身体结构子集的关系,以及 FMA 的 Part-Of 关系。此外,为了不失一般性,我们在依次删除长度冗余和强度冗余之后,对 Part-Of 关系进行了另外两轮应用。最后,我们对最后一轮检测到的不平衡情况进行自动评估,以识别缺失的概念、对齐关系和不一致性。对于 IS-A 关系,在 FMA 中发现了 34 个缺失的概念、80 个不匹配和 18 个冗余,在 SNOMED CT 中发现了 28 个缺失的概念、114 个不匹配和 1 个冗余。此外,还确定了 FMA 中 Part-Of 关系的 6801 个不平衡实例,其中包括 3246 个冗余。从 FMA 中删除这些冗余后,Part-Of 关系的总数急剧减少到 327,包括 51 个缺失的概念、294 个不匹配的关系和 36 个不一致性。FMA 项目负责人进行的手动整理证实了我们的方法在识别整理错误方面的有效性。总之,层次语义关系的粒度平衡是检查本体质量保证的一个有价值的属性,本研究提出的可扩展评估模型在完成这项任务方面非常有效,特别是在审核子层次关系时,例如很少被评估的 Part-Of 关系。