Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
Iranian National Center for Addiction Studies (INCAS), Tehran University of Medical Sciences, Tehran, Iran.
Clin Trials. 2021 Apr;18(2):215-225. doi: 10.1177/1740774520972687. Epub 2020 Dec 1.
Secondary analysis of data from completed randomized controlled trials is a critical and efficient way to maximize the potential benefits from past research. De-identified primary data from completed randomized controlled trials have been increasingly available in recent years; however, the lack of standardized data products is a major barrier to further use of these valuable data. Pre-statistical harmonization of data structure, variables, and codebooks across randomized controlled trials would facilitate secondary data analysis, including meta-analyses and comparative effectiveness studies. We describe a pre-statistical data harmonization initiative to standardize de-identified primary data from substance use disorder treatment randomized controlled trials funded by the National Institute on Drug Abuse available on the National Institute on Drug Abuse Data Share website.
Standardized datasets and codebooks with consistent data structures, variable names, labels, and definitions were developed for 36 completed randomized controlled trials. Common data domains were identified to bundle data files from individual randomized controlled trials according to relevant concepts. Variables were harmonized if at least two randomized controlled trials used the same instruments. The structures of the harmonized data were determined based on the feedback from clinical trialists and substance use disorder research experts.
We have created a harmonized database of variables across 36 randomized controlled trials with a build-in label and a brief definition for each variable. Data files from the randomized controlled trials have been consistently categorized into eight domains (enrollment, demographics, adherence, adverse events, physical health measures, mental-behavioral-cognitive health measures, self-reported substance use measures, and biologic substance use measures). Standardized codebooks and concordance tables have also been developed to help identify instruments and variables of interest more easily.
The harmonized data of randomized controlled trials of substance use disorder treatments can potentially promote future secondary data analysis of completed randomized controlled trials, allowing combining data from multiple randomized controlled trials and provide guidance for future randomized controlled trials in substance use disorder treatment research.
对已完成的随机对照试验数据进行二次分析是从过去的研究中最大限度地挖掘潜在益处的关键且高效的方法。近年来,越来越多的已完成的随机对照试验的去识别化原始数据可被获取;然而,缺乏标准化的数据产品仍是进一步利用这些宝贵数据的主要障碍。在统计分析之前,对随机对照试验的数据结构、变量和代码本进行统一,可以促进二次数据分析,包括荟萃分析和比较有效性研究。我们描述了一项预统计数据协调倡议,旨在标准化国家药物滥用研究所资助的药物使用障碍治疗随机对照试验的去识别化原始数据,并将其纳入国家药物滥用研究所数据共享网站。
为 36 项已完成的随机对照试验开发了标准化的数据集和代码本,这些数据集和代码本具有一致的数据结构、变量名、标签和定义。确定了常见的数据域,以便根据相关概念将来自各个随机对照试验的数据文件捆绑在一起。如果至少有两个随机对照试验使用相同的工具,则对变量进行协调。根据临床试验人员和药物使用障碍研究专家的反馈,确定了协调后数据的结构。
我们创建了一个跨越 36 项随机对照试验的变量协调数据库,其中每个变量都有内置标签和简短定义。来自随机对照试验的数据文件已被一致地分类为八个域(入组、人口统计学、依从性、不良事件、身体健康测量、心理行为认知健康测量、自我报告的药物使用测量和生物药物使用测量)。还制定了标准化的代码本和一致性表格,以帮助更轻松地识别感兴趣的工具和变量。
药物使用障碍治疗的随机对照试验的协调数据有可能促进未来对已完成的随机对照试验的二次数据分析,允许合并来自多个随机对照试验的数据,并为药物使用障碍治疗研究中的未来随机对照试验提供指导。