School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, USA.
J Biomed Semantics. 2022 Aug 13;13(1):22. doi: 10.1186/s13326-022-00276-2.
The Vaccine Ontology (VO) is a biomedical ontology that standardizes vaccine annotation. Errors in VO will affect a multitude of applications that it is being used in. Quality assurance of VO is imperative to ensure that it provides accurate domain knowledge to these downstream tasks. Manual review to identify and fix quality issues (such as missing hierarchical is-a relations) is challenging given the complexity of the ontology. Automated approaches are highly desirable to facilitate the quality assurance of VO.
We developed an automated lexical approach that identifies potentially missing is-a relations in VO. First, we construct two types of VO concept-pairs: (1) linked; and (2) unlinked. Each concept-pair further derives an Acquired Term Pair (ATP) based on their lexical features. If the same ATP is obtained by a linked concept-pair and an unlinked concept-pair, this is considered to indicate a potentially missing is-a relation between the unlinked pair of concepts.
Applying this approach on the 1.1.192 version of VO, we were able to identify 232 potentially missing is-a relations. A manual review by a VO domain expert on a random sample of 70 potentially missing is-a relations revealed that 65 of the cases were valid missing is-a relations in VO (a precision of 92.86%).
The results indicate that our approach is highly effective in identifying missing is-a relation in VO.
疫苗本体(VO)是一个标准化疫苗注释的生物医学本体。VO 中的错误将影响到许多正在使用它的应用程序。为了确保它为这些下游任务提供准确的领域知识,对 VO 进行质量保证是至关重要的。鉴于本体的复杂性,手动审查以识别和修复质量问题(例如缺失层次结构的 is-a 关系)具有挑战性。需要自动化方法来促进 VO 的质量保证。
我们开发了一种自动词汇方法,用于识别 VO 中潜在缺失的 is-a 关系。首先,我们构建了两种类型的 VO 概念对:(1)链接;(2)未链接。每个概念对进一步根据其词汇特征派生一个获得的术语对(ATP)。如果同一个 ATP 是由链接的概念对和未链接的概念对获得的,这被认为表明未链接的概念对之间存在潜在缺失的 is-a 关系。
将此方法应用于 VO 的 1.1.192 版本,我们能够识别出 232 个潜在缺失的 is-a 关系。VO 领域专家对 70 个潜在缺失的 is-a 关系的随机样本进行了手动审查,结果表明 65 个案例是 VO 中有效的缺失的 is-a 关系(准确率为 92.86%)。
结果表明,我们的方法在识别 VO 中的缺失的 is-a 关系方面非常有效。