Haynes Winston A, Vallania Francesco, Liu Charles, Bongen Erika, Tomczak Aurelie, Andres-Terrè Marta, Lofgren Shane, Tam Andrew, Deisseroth Cole A, Li Matthew D, Sweeney Timothy E, Khatri Purvesh
Stanford Institute for Immunity, Transplantation, and Infection, Stanford University, USA2Biomedical Informatics Training Program, Stanford University, USA3Stanford Center for Biomedical Informatics Research, Stanford University, USA.
Pac Symp Biocomput. 2017;22:144-153. doi: 10.1142/9789813207813_0015.
A major contributor to the scientific reproducibility crisis has been that the results from homogeneous, single-center studies do not generalize to heterogeneous, real world populations. Multi-cohort gene expression analysis has helped to increase reproducibility by aggregating data from diverse populations into a single analysis. To make the multi-cohort analysis process more feasible, we have assembled an analysis pipeline which implements rigorously studied meta-analysis best practices. We have compiled and made publicly available the results of our own multi-cohort gene expression analysis of 103 diseases, spanning 615 studies and 36,915 samples, through a novel and interactive web application. As a result, we have made both the process of and the results from multi-cohort gene expression analysis more approachable for non-technical users.
科学可重复性危机的一个主要原因是,来自同质单中心研究的结果无法推广到异质的现实世界人群中。多队列基因表达分析通过将来自不同人群的数据汇总到单一分析中,有助于提高可重复性。为了使多队列分析过程更可行,我们组装了一个分析流程,该流程实施了经过严格研究的荟萃分析最佳实践。我们通过一个新颖的交互式网络应用程序,汇编并公开了我们自己对103种疾病的多队列基因表达分析结果,这些分析涵盖615项研究和36915个样本。因此,我们使多队列基因表达分析的过程和结果对非技术用户来说更易于理解。