Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN 55905, United States.
Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN 55905, United States.
J Biomed Inform. 2011 Dec;44 Suppl 1:S78-S85. doi: 10.1016/j.jbi.2011.08.001. Epub 2011 Aug 5.
The binding of controlled terminology has been regarded as important for standardization of Common Data Elements (CDEs) in cancer research. However, the potential of such binding has not yet been fully explored, especially its quality assurance aspect. The objective of this study is to explore whether there is a relationship between terminological annotations and the UMLS Semantic Network (SN) that can be exploited to improve those annotations. We profiled the terminological concepts associated with the standard structure of the CDEs of the NCI Cancer Data Standards Repository (caDSR) using the UMLS SN. We processed 17798 data elements and extracted 17526 primary object class/property concept pairs. We identified dominant semantic types for the categories "object class" and "property" and determined that the preponderance of the instances were disjoint (i.e. the intersection of semantic types between the two categories is empty). We then performed a preliminary evaluation on the data elements whose asserted primary object class/property concept pairs conflict with this observation - where the semantic type of the object class fell into a SN category typically used by property or visa-versa. In conclusion, the UMLS SN based profiling approach is feasible for the quality assurance and accessibility of the cancer study CDEs. This approach could provide useful insight about how to build mechanisms of quality assurance in a meta-data repository.
受控术语的绑定已被认为是癌症研究中通用数据元素 (CDE) 标准化的重要手段。然而,这种绑定的潜力尚未得到充分探索,特别是其质量保证方面。本研究的目的是探讨术语注释与 UMLS 语义网络 (SN) 之间是否存在关系,以及是否可以利用这种关系来改进这些注释。我们使用 UMLS SN 对 NCI 癌症数据标准存储库 (caDSR) 的 CDE 标准结构相关的术语概念进行了分析。我们处理了 17798 个数据元素,并提取了 17526 个主要对象类/属性概念对。我们确定了“对象类”和“属性”这两个类别的主要语义类型,并确定了大部分实例是不相交的(即两个类别之间的语义类型的交集为空)。然后,我们对断言的主要对象类/属性概念对与这一观察结果相冲突的数据元素进行了初步评估——其中对象类的语义类型属于通常用于属性的 SN 类别,或者反之亦然。总之,基于 UMLS SN 的分析方法可用于癌症研究 CDE 的质量保证和可访问性。这种方法可以为元数据存储库中的质量保证机制提供有用的见解。