McCall Matthew N, Uppal Karan, Jaffee Harris A, Zilliox Michael J, Irizarry Rafael A
Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, 615 N Wolfe Street, Baltimore, MD 21205, USA.
Nucleic Acids Res. 2011 Jan;39(Database issue):D1011-5. doi: 10.1093/nar/gkq1259.
Various databases have harnessed the wealth of publicly available microarray data to address biological questions ranging from across-tissue differential expression to homologous gene expression. Despite their practical value, these databases rely on relative measures of expression and are unable to address the most fundamental question--which genes are expressed in a given cell type. The Gene Expression Barcode is the first database to provide reliable absolute measures of expression for most annotated genes for 131 human and 89 mouse tissue types, including diseased tissue. This is made possible by a novel algorithm that leverages information from the GEO and ArrayExpress public repositories to build statistical models that permit converting data from a single microarray into expressed/unexpressed calls for each gene. For selected platforms, users may upload data and obtain results in a matter of seconds. The raw data, curated annotation, and code used to create our resource are also available at http://rafalab.jhsph.edu/barcode.
各种数据库利用了大量公开可用的微阵列数据来解决从跨组织差异表达到同源基因表达等一系列生物学问题。尽管这些数据库具有实用价值,但它们依赖于相对表达量测量,无法解决最基本的问题——哪些基因在特定细胞类型中表达。基因表达条形码数据库是首个为131种人类和89种小鼠组织类型(包括患病组织)的大多数注释基因提供可靠绝对表达量测量的数据库。这得益于一种新颖的算法,该算法利用来自基因表达综合数据库(GEO)和ArrayExpress公共存储库的信息构建统计模型,从而能够将单个微阵列的数据转换为每个基因的表达/未表达判定。对于选定的平台,用户可以在几秒钟内上传数据并获得结果。用于创建我们资源的原始数据、整理后的注释和代码也可在http://rafalab.jhsph.edu/barcode获取。