Elligo Health Research and Catalysis, USA.
U.S. National Cancer Institute, USA.
J Biomed Inform. 2020 Jul;107:103421. doi: 10.1016/j.jbi.2020.103421. Epub 2020 May 12.
The value of robust and responsible data sharing in clinical research and healthcare is recognized by patients, patient advocacy groups, researchers, journal editors, and the healthcare industry globally. Privacy and security concerns acknowledged, the act of exchanging data (interoperability) along with its meaning (semantic interoperability) across studies and between partners has been difficult, if not elusive. For shared data to retain its value, a recommendation has been made to follow the Findable, Accessible, Interoperable, Reusable (FAIR) principles. Without applying appropriate data exchange standards with domain-relevant content standards and accessible rich metadata that uses applicable terminologies, interoperability is burdened by the need for transformation and/or mapping. These obstacles to interoperability limit the findability, accessibility and reusability of data, thus diminishing its value and making it impossible to adhere to FAIR principles. One effort to standardize data collection has been through common data elements (CDEs). CDEs are data collection units comprising one or more questions together with a set of valid values. Some CDEs contain standardized terminology concepts that define the meaning of the data, and others include links to unique terminology concept identifiers and unique identifiers for each CDE; however, usually CDEs are defined for specific projects or collaborations and lack traceable or machine readable semantics. While the name implies that these are 'common', this has not necessarily been a requirement, and many CDEs have not been commonly used. The National Institutes of Health (NIH) CDEs are, in fact, a conglomerate of CDEs developed in silos by various NIH institutes. Therefore, CDEs have not brought the anticipated benefit to the industry through widescale interoperability, nor is there widespread reuse of CDEs. Certain institutes in the NIH recommend, albeit do not enforce, institute-specific preferred CDEs; however, at the NIH level a preponderance of choice and a lack of any overarching harmonization of CDEs or consistency in linking them to controlled terminology or common identifiers create confusion for researchers in their efforts to identify the best CDEs for their protocol. The problem of comparing data among studies is exacerbated when researchers select different CDEs for the same variable or data collection field. This manuscript explores reasons for the disappointingly low adoption of CDEs and the inability of CDEs or other clinical research standards to broadly solve the interoperability and data sharing problems. Recommendations are offered for rectifying this situation to enable responsible data sharing that will help in adherence to FAIR principles and the realization of Learning Health Systems for the sake of all of us as patients.
在全球范围内,患者、患者权益倡导组织、研究人员、期刊编辑和医疗保健行业都认识到在临床研究和医疗保健中稳健负责的数据共享的价值。尽管已经认识到隐私和安全问题,但在研究之间以及合作伙伴之间交换数据(互操作性)及其含义(语义互操作性)一直很困难,如果不是难以实现的话。为了保持数据的价值,有人建议遵循可发现性、可访问性、互操作性、可重用性(FAIR)原则。如果不使用适当的数据交换标准以及与领域相关的内容标准和使用适用术语的可访问丰富元数据,互操作性就会受到需要转换和/或映射的困扰。这些互操作性障碍限制了数据的可发现性、可访问性和可重用性,从而降低了数据的价值,并使其无法遵守 FAIR 原则。通过通用数据元素 (CDE) 来标准化数据收集是一种努力。CDE 是由一个或多个问题以及一组有效值组成的数据收集单元。一些 CDE 包含定义数据含义的标准化术语概念,而另一些则包含指向唯一术语概念标识符和每个 CDE 的唯一标识符的链接;但是,通常 CDE 是为特定项目或合作定义的,并且缺乏可跟踪或机器可读的语义。虽然名称暗示这些是“通用的”,但这不一定是必需的,并且许多 CDE 并未被广泛使用。事实上,美国国立卫生研究院 (NIH) 的 CDE 是由各种 NIH 研究所各自独立开发的 CDE 的集合。因此,CDE 并没有通过广泛的互操作性为行业带来预期的好处,也没有广泛使用 CDE。NIH 的某些机构建议使用特定机构首选的 CDE,但不强制使用;但是,在 NIH 层面,选择过多且缺乏对 CDE 或将其链接到受控术语或通用标识符的任何总体协调或一致性,导致研究人员在努力确定最适合其方案的 CDE 时感到困惑。当研究人员为同一变量或数据收集字段选择不同的 CDE 时,研究之间比较数据的问题会更加严重。本文探讨了令人失望的 CDE 采用率低的原因,以及 CDE 或其他临床研究标准无法广泛解决互操作性和数据共享问题的原因。为了纠正这种情况,提出了一些建议,以实现负责任的数据共享,这将有助于遵守 FAIR 原则,并为了我们所有人作为患者实现学习健康系统。