Jiang Guoqian, Chute Christopher G
Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN 55905, USA.
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):89-102. doi: 10.1197/jamia.M2541. Epub 2008 Oct 24.
This study sought to develop and evaluate an approach for auditing the semantic completeness of the SNOMED CT contents using a formal concept analysis (FCA)-based model.
We developed a model for formalizing the normal forms of SNOMED CT expressions using FCA. Anonymous nodes, identified through the analyses, were retrieved from the model for evaluation. Two quasi-Poisson regression models were developed to test whether anonymous nodes can evaluate the semantic completeness of SNOMED CT contents (Model 1), and for testing whether such completeness differs between 2 clinical domains (Model 2). The data were randomly sampled from all the contexts that could be formed in the 2 largest domains: Procedure and Clinical Finding. Case studies (n = 4) were performed on randomly selected anonymous node samples for validation.
In Model 1, the outcome variable is the number of fully defined concepts within a context, while the explanatory variables are the number of lattice nodes and the number of anonymous nodes. In Model 2, the outcome variable is the number of anonymous nodes and the explanatory variables are the number of lattice nodes and a binary category for domain (Procedure/Clinical Finding).
A total of 5,450 contexts from the 2 domains were collected for analyses. Our findings revealed that the number of anonymous nodes had a significant negative correlation with the number of fully defined concepts within a context (p < 0.001). Further, the Clinical Finding domain had fewer anonymous nodes than the Procedure domain (p < 0.001). Case studies demonstrated that the anonymous nodes are an effective index for auditing SNOMED CT.
The anonymous nodes retrieved from FCA-based analyses are a candidate proxy for the semantic completeness of the SNOMED CT contents. Our novel FCA-based approach can be useful for auditing the semantic completeness of SNOMED CT contents, or any large ontology, within or across domains.
本研究旨在开发并评估一种使用基于形式概念分析(FCA)的模型来审核SNOMED CT内容语义完整性的方法。
我们使用FCA开发了一个用于将SNOMED CT表达式的范式形式化的模型。通过分析识别出的匿名节点从该模型中检索出来进行评估。开发了两个准泊松回归模型,以测试匿名节点是否能够评估SNOMED CT内容的语义完整性(模型1),以及测试这种完整性在两个临床领域之间是否存在差异(模型2)。数据是从两个最大领域(程序和临床发现)中可以形成的所有上下文中随机抽样得到的。对随机选择的匿名节点样本进行了4个案例研究以进行验证。
在模型1中,结果变量是一个上下文中完全定义的概念数量,而解释变量是格节点数量和匿名节点数量。在模型2中,结果变量是匿名节点数量,解释变量是格节点数量和一个用于领域的二元类别(程序/临床发现)。
从这两个领域共收集了5450个上下文进行分析。我们的研究结果表明,匿名节点数量与一个上下文中完全定义的概念数量呈显著负相关(p < 0.001)。此外,临床发现领域的匿名节点比程序领域少(p < 0.001)。案例研究表明,匿名节点是审核SNOMED CT的有效指标。
从基于FCA的分析中检索出的匿名节点是SNOMED CT内容语义完整性的候选代理。我们基于FCA的新颖方法可用于审核SNOMED CT内容或任何大型本体在域内或跨域的语义完整性。