Yang Sunmo, Kim Chan Yeong, Hwang Sohyun, Kim Eiru, Kim Hyojin, Shim Hongseok, Lee Insuk
Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea.
Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
Nucleic Acids Res. 2017 Jan 4;45(D1):D389-D396. doi: 10.1093/nar/gkw868. Epub 2016 Sep 26.
The use of high-throughput array and sequencing technologies has produced unprecedented amounts of gene expression data in central public depositories, including the Gene Expression Omnibus (GEO). The immense amount of expression data in GEO provides both vast research opportunities and data analysis challenges. Co-expression analysis of high-dimensional expression data has proven effective for the study of gene functions, and several co-expression databases have been developed. Here, we present a new co-expression database, COEXPEDIA (www.coexpedia.org), which is distinctive from other co-expression databases in three aspects: (i) it contains only co-functional co-expressions that passed a rigorous statistical assessment for functional association, (ii) the co-expressions were inferred from individual studies, each of which was designed to investigate gene functions with respect to a particular biomedical context such as a disease and (iii) the co-expressions are associated with medical subject headings (MeSH) that provide biomedical information for anatomical, disease, and chemical relevance. COEXPEDIA currently contains approximately eight million co-expressions inferred from 384 and 248 GEO series for humans and mice, respectively. We describe how these MeSH-associated co-expressions enable the identification of diseases and drugs previously unknown to be related to a gene or a gene group of interest.
高通量芯片和测序技术的应用在包括基因表达综合数据库(GEO)在内的中央公共存储库中产生了前所未有的基因表达数据量。GEO中大量的表达数据既提供了巨大的研究机会,也带来了数据分析挑战。高维表达数据的共表达分析已被证明对基因功能研究有效,并且已经开发了几个共表达数据库。在这里,我们展示了一个新的共表达数据库COEXPEDIA(www.coexpedia.org),它在三个方面与其他共表达数据库不同:(i)它仅包含通过严格的功能关联统计评估的共功能共表达;(ii)共表达是从个体研究中推断出来的,每个个体研究都旨在针对特定的生物医学背景(如疾病)研究基因功能;(iii)共表达与医学主题词(MeSH)相关联,MeSH提供了解剖学、疾病和化学相关性的生物医学信息。COEXPEDIA目前分别包含从384个和248个GEO系列中推断出的约八百万个共表达,分别针对人类和小鼠。我们描述了这些与MeSH相关的共表达如何能够识别先前未知与感兴趣的基因或基因组相关的疾病和药物。