Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
Sci Data. 2019 Sep 3;6(1):166. doi: 10.1038/s41597-019-0174-7.
The field of pharmacogenomics presents great challenges for researchers that are willing to make their studies reproducible and shareable. This is attributed to the generation of large volumes of high-throughput multimodal data, and the lack of standardized workflows that are robust, scalable, and flexible to perform large-scale analyses. To address this issue, we developed pharmacogenomic workflows in the Common Workflow Language to process two breast cancer datasets in a reproducible and transparent manner. Our pipelines combine both pharmacological and molecular profiles into a portable data object that can be used for future analyses in cancer research. Our data objects and workflows are shared on Harvard Dataverse and Code Ocean where they have been assigned a unique Digital Object Identifier, providing a level of data provenance and a persistent location to access and share our data with the community.
药物基因组学领域为那些希望使研究具有可重复性和可共享性的研究人员带来了巨大的挑战。这归因于大量高通量多模态数据的产生,以及缺乏稳健、可扩展且灵活的标准化工作流程来执行大规模分析。为了解决这个问题,我们在通用工作流程语言中开发了药物基因组学工作流程,以可重复和透明的方式处理两个乳腺癌数据集。我们的管道将药理学和分子谱组合成一个可移植的数据对象,可用于癌症研究的未来分析。我们的数据对象和工作流程已在哈佛数据共享平台和 CodeOcean 上共享,它们被分配了一个唯一的数字对象标识符,为数据来源提供了一个级别,并为访问和与社区共享我们的数据提供了一个持久的位置。