Lau William W, Sparks Rachel, Tsang John S
Office of Intramural Research, Center for Information Technology, National Institutes of Health, Bethesda, Maryland, USA.
Systems Genomics and Bioinformatics Unit, Laboratory of Systems Biology, National Institutes of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA.
F1000Res. 2016 Dec 20;5:2884. doi: 10.12688/f1000research.10465.1. eCollection 2016.
: The proliferation of publicly accessible large-scale biological data together with increasing availability of bioinformatics tools have the potential to transform biomedical research. Here we report a crowdsourcing Jamboree that explored whether a team of volunteer biologists without formal bioinformatics training could use OMiCC, a crowdsourcing web platform that facilitates the reuse and (meta-) analysis of public gene expression data, to compile and annotate gene expression data, and design comparisons between disease and control sample groups. The Jamboree focused on several common human autoimmune diseases, including systemic lupus erythematosus (SLE), multiple sclerosis (MS), type I diabetes (DM1), and rheumatoid arthritis (RA), and the corresponding mouse models. Meta-analyses were performed in OMiCC using comparisons constructed by the participants to identify 1) gene expression signatures for each disease (disease versus healthy controls at the gene expression and biological pathway levels), 2) conserved signatures across all diseases within each species (pan-disease signatures), and 3) conserved signatures between species for each disease and across all diseases (cross-species signatures). A large number of differentially expressed genes were identified for each disease based on meta-analysis, with observed overlap among diseases both within and across species. Gene set/pathway enrichment of upregulated genes suggested conserved signatures (e.g., interferon) across all human and mouse conditions. Our Jamboree exercise provides evidence that when enabled by appropriate tools, a "crowd" of biologists can work together to accelerate the pace by which the increasingly large amounts of public data can be reused and meta-analyzed for generating and testing hypotheses. Our encouraging experience suggests that a similar crowdsourcing approach can be used to explore other biological questions.
公开可用的大规模生物数据的激增以及生物信息学工具可用性的提高,有可能改变生物医学研究。在此,我们报告了一场众包活动,探讨了一组未经正规生物信息学培训的志愿者生物学家是否可以使用OMiCC(一个促进公共基因表达数据的再利用和(元)分析的众包网络平台)来汇编和注释基因表达数据,并设计疾病样本组与对照样本组之间的比较。该众包活动聚焦于几种常见的人类自身免疫性疾病,包括系统性红斑狼疮(SLE)、多发性硬化症(MS)、I型糖尿病(DM1)和类风湿性关节炎(RA),以及相应的小鼠模型。在OMiCC中使用参与者构建的比较进行元分析,以识别:1)每种疾病的基因表达特征(基因表达和生物途径水平上的疾病与健康对照);2)每个物种内所有疾病的保守特征(泛疾病特征);3)每种疾病以及所有疾病在物种间的保守特征(跨物种特征)。基于元分析为每种疾病鉴定出大量差异表达基因,在物种内和物种间的疾病中均观察到重叠。上调基因的基因集/途径富集表明在所有人类和小鼠条件下存在保守特征(例如,干扰素)。我们的众包活动提供了证据,表明在适当工具的支持下,一群生物学家可以共同努力,加快对日益大量的公共数据进行再利用和元分析以生成和检验假设的速度。我们令人鼓舞的经验表明,类似的众包方法可用于探索其他生物学问题。