Nadkarni P M, Brandt C A
Center for Medical Informatics, Yale University School of Medicine, PO Box 208009, New Haven, CT 06520-8009, USA.
Methods Inf Med. 2006;45(6):594-601.
The National Cancer Institute (NCI) has developed the Common Data Elements (CDE) to serve as a controlled vocabulary of data descriptors for cancer research, to facilitate data interchange and inter-operability between cancer research centers. We evaluated CDE's structure to see whether it could represent the elements necessary to support its intended purpose, and whether it could prevent errors and inconsistencies from being accidentally introduced. We also performed automated checks for certain types of content errors that provided a rough measure of curation quality.
Evaluation was performed on CDE content downloaded via the NCI's CDE Browser, and transformed into relational database form. Evaluation was performed under three categories: 1) compatibility with the ISO/IEC 11179 metadata model, on which CDE structure is based, 2) features necessary for controlled vocabulary support, and 3) support for a stated NCI goal, set up of data collection forms for cancer research.
Various limitations were identified both with respect to content (inconsistency, insufficient definition of elements, redundancy) as well as structure--particularly the need for term and relationship support, as well as the need for metadata supporting the explicit representation of electronic forms that utilize sets of common data elements.
While there are numerous positive aspects to the CDE effort, there is considerable opportunity for improvement. Our recommendations include review of existing content by diverse experts in the cancer community; integration with the NCI thesaurus to take advantage of the latter's links to nationally used controlled vocabularies, and various schema enhancements required for electronic form support.
美国国家癌症研究所(NCI)已开发出通用数据元素(CDE),作为癌症研究数据描述符的受控词汇表,以促进癌症研究中心之间的数据交换和互操作性。我们评估了CDE的结构,以确定它是否能够代表支持其预期目的所需的元素,以及它是否能够防止错误和不一致性被意外引入。我们还对某些类型的内容错误进行了自动检查,这些检查提供了对管理质量的粗略衡量。
对通过NCI的CDE浏览器下载并转换为关系数据库形式的CDE内容进行评估。评估分为三类:1)与CDE结构所基于的ISO/IEC 11179元数据模型的兼容性,2)受控词汇表支持所需的功能,3)对NCI既定目标的支持,即建立癌症研究数据收集表单。
在内容(不一致性、元素定义不足、冗余)以及结构方面都发现了各种限制——特别是对术语和关系支持的需求,以及对支持利用通用数据元素集的电子表单的显式表示的元数据的需求。
虽然CDE的工作有许多积极方面,但仍有很大的改进空间。我们的建议包括由癌症领域的不同专家审查现有内容;与NCI叙词表集成,以利用后者与全国使用的受控词汇表的链接,以及电子表单支持所需的各种模式增强。