Agrawal Ankur, Elhanan Gai
Manhattan College, Riverdale, NY, United States.
Halfpenny Technologies Inc., Blue Bell, PA, United States.
J Biomed Inform. 2014 Feb;47:192-8. doi: 10.1016/j.jbi.2013.11.003. Epub 2013 Nov 15.
To quantify the presence of and evaluate an approach for detection of inconsistencies in the formal definitions of SNOMED CT (SCT) concepts utilizing a lexical method.
Utilizing SCT's Procedure hierarchy, we algorithmically formulated similarity sets: groups of concepts with similar lexical structure of their fully specified name. We formulated five random samples, each with 50 similarity sets, based on the same parameter: number of parents, attributes, groups, all the former as well as a randomly selected control sample. All samples' sets were reviewed for types of formal definition inconsistencies: hierarchical, attribute assignment, attribute target values, groups, and definitional.
For the Procedure hierarchy, 2111 similarity sets were formulated, covering 18.1% of eligible concepts. The evaluation revealed that 38 (Control) to 70% (Different relationships) of similarity sets within the samples exhibited significant inconsistencies. The rate of inconsistencies for the sample with different relationships was highly significant compared to Control, as well as the number of attribute assignment and hierarchical inconsistencies within their respective samples.
While, at this time of the HITECH initiative, the formal definitions of SCT are only a minor consideration, in the grand scheme of sophisticated, meaningful use of captured clinical data, they are essential. However, significant portion of the concepts in the most semantically complex hierarchy of SCT, the Procedure hierarchy, are modeled inconsistently in a manner that affects their computability. Lexical methods can efficiently identify such inconsistencies and possibly allow for their algorithmic resolution.
运用词汇方法量化SNOMED CT(SCT)概念的形式定义中不一致性的存在情况并评估一种检测方法。
利用SCT的程序层次结构,我们通过算法制定相似性集合:具有相似完全指定名称词汇结构的概念组。基于相同参数(父项数量、属性、组,所有前者以及一个随机选择的对照样本),我们制定了五个随机样本,每个样本有50个相似性集合。对所有样本集合进行形式定义不一致性类型的审查:层次结构、属性分配、属性目标值、组和定义。
对于程序层次结构,制定了2111个相似性集合,涵盖了18.1%的合格概念。评估显示,样本中的相似性集合有38%(对照)至70%(不同关系)表现出显著的不一致性。与对照相比,具有不同关系的样本的不一致率非常显著,其各自样本中的属性分配和层次结构不一致数量也是如此。
虽然在HITECH倡议的这个阶段,SCT的形式定义只是一个次要考虑因素,但在复杂、有意义地使用捕获的临床数据的总体计划中,它们是必不可少的。然而,SCT最语义复杂的层次结构即程序层次结构中的很大一部分概念,其建模方式不一致,影响了它们的可计算性。词汇方法可以有效地识别此类不一致性,并可能允许通过算法解决它们。