RTI International, Research Triangle Park, NC, USA.
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
Sci Data. 2022 Sep 1;9(1):532. doi: 10.1038/s41597-022-01660-4.
Identifying relevant studies and harmonizing datasets are major hurdles for data reuse. Common Data Elements (CDEs) can help identify comparable study datasets and reduce the burden of retrospective data harmonization, but they have not been required, historically. The collaborative team at PhenX and dbGaP developed an approach to use PhenX variables as a set of CDEs to link phenotypic data and identify comparable studies in dbGaP. Variables were identified as either comparable or related, based on the data collection mode used to harmonize data across mapped datasets. We further added a CDE data field in the dbGaP data submission packet to indicate use of PhenX and annotate linkages in the future. Some 13,653 dbGaP variables from 521 studies were linked through PhenX variable mapping. These variable linkages have been made accessible for browsing and searching in the repository through dbGaP CDE-faceted search filter and the PhenX variable search tool. New features in dbGaP and PhenX enable investigators to identify variable linkages among dbGaP studies and reveal opportunities for cross-study analysis.
识别相关研究和协调数据集是数据重用的主要障碍。通用数据元素 (CDE) 可以帮助识别可比的研究数据集并减轻回顾性数据协调的负担,但从历史上看,它们并不是必需的。PhenX 和 dbGaP 的协作团队开发了一种方法,使用 PhenX 变量作为一组 CDE 来链接表型数据并在 dbGaP 中识别可比的研究。根据用于协调映射数据集之间数据的数据集使用的数据收集模式,将变量确定为可比或相关。我们进一步在 dbGaP 数据提交包中添加了一个 CDE 数据字段,以表明使用了 PhenX 并在将来进行注释链接。通过 PhenX 变量映射,来自 521 项研究的约 13653 个 dbGaP 变量被链接。这些变量链接已经可以通过 dbGaP CDE 分面搜索过滤器和 PhenX 变量搜索工具在存储库中进行浏览和搜索。dbGaP 和 PhenX 的新功能使研究人员能够识别 dbGaP 研究之间的变量链接,并揭示跨研究分析的机会。