Department of Biomedical Informatics, Columbia University, New York, NY, USA.
J Biomed Inform. 2011 Apr;44(2):289-98. doi: 10.1016/j.jbi.2011.01.005. Epub 2011 Jan 22.
Most existing controlled terminologies can be characterized as collections of terms, wherein the terms are arranged in a simple list or organized in a hierarchy. These kinds of terminologies are considered useful for standardizing terms and encoding data and are currently used in many existing information systems. However, they suffer from a number of limitations that make data reuse difficult. Relatively recently, it has been proposed that formal ontological methods can be applied to some of the problems of terminological design. Biomedical ontologies organize concepts (embodiments of knowledge about biomedical reality) whereas terminologies organize terms (what is used to code patient data at a certain point in time, based on the particular terminology version). However, the application of these methods to existing terminologies is not straightforward. The use of these terminologies is firmly entrenched in many systems, and what might seem to be a simple option of replacing these terminologies is not possible. Moreover, these terminologies evolve over time in order to suit the needs of users. Any methodology must therefore take these constraints into consideration, hence the need for formal methods of managing changes. Along these lines, we have developed a formal representation of the concept-term relation, around which we have also developed a methodology for management of terminology changes. The objective of this study was to determine whether our methodology would result in improved retrieval of data.
Comparison of two methods for retrieving data encoded with terms from the International Classification of Diseases (ICD-9-CM), based on their recall when retrieving data for ICD-9-CM terms whose codes had changed but which had retained their original meaning (code change).
Recall and interclass correlation coefficient.
Statistically significant differences were detected (p<0.05) with the McNemar test for two terms whose codes had changed. Furthermore, when all the cases are combined in an overall category, our method also performs statistically significantly better (p<0.05).
Our study shows that an ontology-based ICD-9-CM data retrieval method that takes into account the effects of terminology changes performs better on recall than one that does not in the retrieval of data for terms whose codes had changed but which retained their original meaning.
大多数现有的受控术语表都可以被描述为术语集,其中术语以简单列表的形式排列或组织成层次结构。这些类型的术语表被认为对术语标准化和数据编码很有用,并且目前在许多现有信息系统中使用。然而,它们存在一些限制,使得数据重用变得困难。相对较新的是,已经提出可以将形式本体论方法应用于术语设计的一些问题。生物医学本体组织概念(生物医学现实知识的体现),而术语表则组织术语(基于特定术语版本,在特定时间点用于对患者数据进行编码的术语)。然而,将这些方法应用于现有术语表并不简单。这些术语表在许多系统中已经根深蒂固,看似简单的替换这些术语表的选项是不可能的。此外,这些术语表随着时间的推移而发展,以适应用户的需求。因此,任何方法都必须考虑到这些限制,因此需要有正式的管理变更方法。在此基础上,我们开发了一种概念-术语关系的正式表示形式,围绕该关系,我们还开发了一种术语表变更管理方法。本研究的目的是确定我们的方法是否会导致数据检索的改进。
比较两种基于国际疾病分类(ICD-9-CM)中术语编码的数据检索方法,这些方法基于它们在检索 ICD-9-CM 术语代码发生变化但保留其原始含义(代码更改)的数据时的召回率。
召回率和组间相关系数。
通过 McNemar 检验,对两个代码发生变化的术语进行了统计上显著的差异检测(p<0.05)。此外,当所有病例合并为一个总体类别时,我们的方法在统计上也表现出更好的性能(p<0.05)。
我们的研究表明,在考虑术语变化影响的情况下,基于本体的 ICD-9-CM 数据检索方法在召回率方面优于不考虑术语变化的方法,用于检索代码发生变化但保留其原始含义的术语的数据。