Zhou Weizhuang, Han Lichy, Altman Russ B
Department of Bioengineering.
Biomedical Informatics Training Program.
Bioinformatics. 2017 Feb 15;33(4):522-528. doi: 10.1093/bioinformatics/btw664.
Microarray measurements of gene expression constitute a large fraction of publicly shared biological data, and are available in the Gene Expression Omnibus (GEO). Many studies use GEO data to shape hypotheses and improve statistical power. Within GEO, the Affymetrix HG-U133A and HG-U133 Plus 2.0 are the two most commonly used microarray platforms for human samples; the HG-U133 Plus 2.0 platform contains 54 220 probes and the HG-U133A array contains a proper subset (21 722 probes). When different platforms are involved, the subset of common genes is most easily compared. This approach results in the exclusion of substantial measured data and can limit downstream analysis. To predict the expression values for the genes unique to the HG-U133 Plus 2.0 platform, we constructed a series of gene expression inference models based on genes common to both platforms. Our model predicts gene expression values that are within the variability observed in controlled replicate studies and are highly correlated with measured data. Using six previously published studies, we also demonstrate the improved performance of the enlarged feature space generated by our model in downstream analysis.
The gene inference model described in this paper is available as a R package (affyImpute), which can be downloaded at http://simtk.org/home/affyimpute.
Supplementary data are available at Bioinformatics online.
基因表达的微阵列测量构成了公开共享生物数据的很大一部分,并且可在基因表达综合数据库(GEO)中获取。许多研究使用GEO数据来构建假设并提高统计效力。在GEO中,Affymetrix HG-U133A和HG-U133 Plus 2.0是用于人类样本的两个最常用的微阵列平台;HG-U133 Plus 2.0平台包含54220个探针,HG-U133A阵列包含一个适当的子集(21722个探针)。当涉及不同平台时,最容易比较共同基因的子集。这种方法会导致排除大量实测数据,并可能限制下游分析。为了预测HG-U133 Plus 2.0平台特有的基因的表达值,我们基于两个平台共有的基因构建了一系列基因表达推断模型。我们的模型预测的基因表达值在对照重复研究中观察到的变异性范围内,并且与实测数据高度相关。使用六项先前发表的研究,我们还证明了我们的模型生成的扩大特征空间在下游分析中的性能提升。
本文中描述的基因推断模型作为一个R包(affyImpute)可用,可从http://simtk.org/home/affyimpute下载。
补充数据可在《生物信息学》在线获取。