Zhang Guo-Qiang, Bodenreider Olivier
Case Western Reserve University, Cleveland, OH 44106, USA.
National Library of Medicine, Bethesda, MD 20892, USA.
Semant Web ISWC. 2010;6497:273-288. doi: 10.1007/978-3-642-17749-1_18.
We present a scalable, SPARQL-based computational pipeline for testing the lattice-theoretic properties of partial orders represented as RDF triples. The use case for this work is quality assurance in biomedical ontologies, one desirable property of which is conformance to lattice structures. At the core of our pipeline is the algorithm called , for detecting the mber of nimal upper bounds of any pair of elements in a given finite partial order. Our technical contribution is the coding of completely in SPARQL. To show its scalability, we applied to the entirety of SNOMED CT, the largest clinical ontology (over 300,000 conepts). Our experimental results have been groundbreaking: for the first time, all non-lattice pairs in SNOMED CT have been identified exhaustively from 34 million candidate pairs using over 2.5 billion queries issued to Virtuoso. The percentage of non-lattice pairs ranges from 0 to 1.66 among the 19 SNOMED CT hierarchies. These non-lattice pairs represent target areas for focused curation by domain experts. RDF, SPARQL and related tooling provide an e cient platform for implementing lattice algorithms on large data structures.
我们提出了一种基于SPARQL的可扩展计算管道,用于测试表示为RDF三元组的偏序关系的格理论属性。这项工作的用例是生物医学本体中的质量保证,其中一个理想属性是符合格结构。我们管道的核心是名为 的算法,用于检测给定有限偏序中任意一对元素的最小上界数量。我们的技术贡献是将 完全用SPARQL编码。为了展示其可扩展性,我们将 应用于整个SNOMED CT,这是最大的临床本体(超过30万个概念)。我们的实验结果具有开创性:首次通过向Virtuoso发出超过25亿个查询,从3400万个候选对中详尽地识别出了SNOMED CT中的所有非格对。在19个SNOMED CT层次结构中,非格对的百分比范围从0到1.66。这些非格对代表了领域专家进行重点策划的目标区域。RDF、SPARQL和相关工具为在大型数据结构上实现格算法提供了一个高效的平台。