Duke Clinical Research Institute, Durham, North Carolina, United States of America.
PLoS One. 2012;7(3):e33677. doi: 10.1371/journal.pone.0033677. Epub 2012 Mar 16.
The ClinicalTrials.gov registry provides information regarding characteristics of past, current, and planned clinical studies to patients, clinicians, and researchers; in addition, registry data are available for bulk download. However, issues related to data structure, nomenclature, and changes in data collection over time present challenges to the aggregate analysis and interpretation of these data in general and to the analysis of trials according to clinical specialty in particular. Improving usability of these data could enhance the utility of ClinicalTrials.gov as a research resource.
METHODS/PRINCIPAL RESULTS: The purpose of our project was twofold. First, we sought to extend the usability of ClinicalTrials.gov for research purposes by developing a database for aggregate analysis of ClinicalTrials.gov (AACT) that contains data from the 96,346 clinical trials registered as of September 27, 2010. Second, we developed and validated a methodology for annotating studies by clinical specialty, using a custom taxonomy employing Medical Subject Heading (MeSH) terms applied by an NLM algorithm, as well as MeSH terms and other disease condition terms provided by study sponsors. Clinical specialists reviewed and annotated MeSH and non-MeSH disease condition terms, and an algorithm was created to classify studies into clinical specialties based on both MeSH and non-MeSH annotations. False positives and false negatives were evaluated by comparing algorithmic classification with manual classification for three specialties.
CONCLUSIONS/SIGNIFICANCE: The resulting AACT database features study design attributes parsed into discrete fields, integrated metadata, and an integrated MeSH thesaurus, and is available for download as Oracle extracts (.dmp file and text format). This publicly-accessible dataset will facilitate analysis of studies and permit detailed characterization and analysis of the U.S. clinical trials enterprise as a whole. In addition, the methodology we present for creating specialty datasets may facilitate other efforts to analyze studies by specialty groups.
ClinicalTrials.gov 注册处为患者、临床医生和研究人员提供了过去、现在和计划中的临床研究的特征信息;此外,还可以批量下载注册处的数据。然而,数据结构、命名法以及随时间变化的数据收集方面的问题给这些数据的综合分析和解释带来了挑战,尤其是对根据临床专业对试验的分析带来了挑战。提高这些数据的可用性可以增强 ClinicalTrials.gov 作为研究资源的实用性。
方法/主要结果:我们项目的目的有两个。首先,我们试图通过开发一个用于 ClinicalTrials.gov 的汇总分析的数据库(AACT)来扩展 ClinicalTrials.gov 的研究用途,该数据库包含截至 2010 年 9 月 27 日注册的 96346 项临床试验的数据。其次,我们开发并验证了一种通过临床专业对研究进行注释的方法,使用了一种自定义分类法,该分类法采用了 NLM 算法应用的医学主题词(MeSH)术语,以及研究赞助商提供的 MeSH 术语和其他疾病状况术语。临床专家对 MeSH 和非 MeSH 疾病状况术语进行了审查和注释,并创建了一个算法,根据 MeSH 和非 MeSH 注释将研究分类为临床专业。通过比较算法分类和手动分类,评估了假阳性和假阴性。
结论/意义:由此产生的 AACT 数据库具有将研究设计属性解析为离散字段的功能、集成的元数据和集成的 MeSH 词库,并且可以作为 Oracle 提取(.dmp 文件和文本格式)下载。这个公开数据集将促进研究分析,并允许对整个美国临床试验企业进行详细描述和分析。此外,我们提出的用于创建专业数据集的方法可能有助于其他按专业组分析研究的努力。