Ochs Christopher, Geller James, Perl Yehoshua, Chen Yan, Xu Junchuan, Min Hua, Case James T, Wei Zhi
Computer Science Department, New Jersey Institute of Technology, Newark, New Jersey, USA.
Computer Information Systems Department, BMCC, CUNY, New York, New York, USA.
J Am Med Inform Assoc. 2015 May;22(3):507-18. doi: 10.1136/amiajnl-2014-003151. Epub 2014 Oct 21.
Standards terminologies may be large and complex, making their quality assurance challenging. Some terminology quality assurance (TQA) methodologies are based on abstraction networks (AbNs), compact terminology summaries. We have tested AbNs and the performance of related TQA methodologies on small terminology hierarchies. However, some standards terminologies, for example, SNOMED, are composed of very large hierarchies. Scaling AbN TQA techniques to such hierarchies poses a significant challenge. We present a scalable subject-based approach for AbN TQA.
An innovative technique is presented for scaling TQA by creating a new kind of subject-based AbN called a subtaxonomy for large hierarchies. New hypotheses about concentrations of erroneous concepts within the AbN are introduced to guide scalable TQA.
We test the TQA methodology for a subject-based subtaxonomy for the Bleeding subhierarchy in SNOMED's large Clinical finding hierarchy. To test the error concentration hypotheses, three domain experts reviewed a sample of 300 concepts. A consensus-based evaluation identified 87 erroneous concepts. The subtaxonomy-based TQA methodology was shown to uncover statistically significantly more erroneous concepts when compared to a control sample.
The scalability of TQA methodologies is a challenge for large standards systems like SNOMED. We demonstrated innovative subject-based TQA techniques by identifying groups of concepts with a higher likelihood of having errors within the subtaxonomy. Scalability is achieved by reviewing a large hierarchy by subject.
An innovative methodology for scaling the derivation of AbNs and a TQA methodology was shown to perform successfully for the largest hierarchy of SNOMED.
标准术语可能庞大而复杂,这使得其质量保证颇具挑战性。一些术语质量保证(TQA)方法基于抽象网络(AbN),即紧凑的术语摘要。我们已经在小型术语层次结构上测试了AbN及相关TQA方法的性能。然而,一些标准术语,例如SNOMED,是由非常大的层次结构组成的。将AbN TQA技术扩展到如此庞大的层次结构面临重大挑战。我们提出了一种可扩展的基于主题的AbN TQA方法。
提出了一种创新技术,通过为大型层次结构创建一种新型的基于主题的AbN(称为子分类法)来扩展TQA。引入了关于AbN中错误概念集中情况的新假设,以指导可扩展的TQA。
我们针对SNOMED大型临床发现层次结构中的出血子层次结构,测试了基于主题的子分类法的TQA方法。为了测试错误集中假设,三位领域专家审查了300个概念的样本。基于共识的评估确定了87个错误概念。与对照样本相比,基于子分类法的TQA方法在统计学上能发现更多错误概念。
TQA方法的可扩展性对于像SNOMED这样的大型标准系统来说是一项挑战。我们通过识别子分类法中更有可能存在错误的概念组,展示了创新的基于主题的TQA技术。通过按主题审查大型层次结构实现了可扩展性。
一种用于扩展AbN推导的创新方法和一种TQA方法在SNOMED最大的层次结构中成功运行。