Zhang Guo-Qiang, Zhu Wei, Sun Mengmeng, Tao Shiqiang, Bodenreider Olivier, Cui Licong
Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106.
National Library of Medicine, Bethesda, MD 20892, USA.
Proc IEEE Int Conf Big Data. 2014 Oct;2014:754-759. doi: 10.1109/BigData.2014.7004301.
Non-lattice fragments are often indicative of structural anomalies in ontological systems and, as such, represent possible areas of focus for subsequent quality assurance work. However, extracting the non-lattice fragments in large ontological systems is computationally expensive if not prohibitive, using a traditional sequential approach. In this paper we present a general MapReduce pipeline, called MaPLE (MapReduce Pipeline for Lattice-based Evaluation), for extracting non-lattice fragments in large partially ordered sets and demonstrate its applicability in ontology quality assurance. Using MaPLE in a 30-node Hadoop local cloud, we systematically extracted non-lattice fragments in 8 SNOMED CT versions from 2009 to 2014 (each containing over 300k concepts), with an average total computing time of less than 3 hours per version. With dramatically reduced time, MaPLE makes it feasible not only to perform exhaustive structural analysis of large ontological hierarchies, but also to systematically track structural changes between versions. Our change analysis showed that the average change rates on the non-lattice pairs are up to 38.6 times higher than the change rates of the background structure (concept nodes). This demonstrates that fragments around non-lattice pairs exhibit significantly higher rates of change in the process of ontological evolution.
非格片段通常表明本体系统中存在结构异常,因此代表了后续质量保证工作可能关注的领域。然而,使用传统的顺序方法在大型本体系统中提取非格片段,即便不是完全不可行,计算成本也很高。在本文中,我们提出了一种通用的MapReduce管道,称为MaPLE(基于格的评估的MapReduce管道),用于在大型偏序集中提取非格片段,并展示了其在本体质量保证中的适用性。在一个由30个节点组成的Hadoop本地云中使用MaPLE,我们系统地提取了2009年至2014年8个SNOMED CT版本中的非格片段(每个版本包含超过30万个概念),每个版本的平均总计算时间不到3小时。随着时间大幅减少,MaPLE不仅使对大型本体层次结构进行详尽的结构分析成为可能,而且还能系统地跟踪不同版本之间的结构变化。我们的变化分析表明,非格对的平均变化率比背景结构(概念节点)的变化率高出多达38.6倍。这表明在本体进化过程中,非格对周围的片段表现出明显更高的变化率。