Wernisch Lorenz, Kendall Sharon L, Soneji Shamit, Wietzorrek Andreas, Parish Tanya, Hinds Jason, Butcher Philip D, Stoker Neil G
School of Crystallography, Birkbeck College, London WC1E 7HX, UK.
Bioinformatics. 2003 Jan;19(1):53-61. doi: 10.1093/bioinformatics/19.1.53.
Microarray experiments are inherently noisy. Replication is the key to estimating realistic fold-changes despite such noise. In the analysis of the various sources of noise the dependency structure of the replication needs to be taken into account.
We analyzed replicate data sets from a Mycobacterium tuberculosis trcS mutant in order to identify differentially expressed genes and suggest new methods for filtering and normalizing raw array data and for imputing missing values. Mixed ANOVA models are applied to quantify the various sources of error. Such analysis also allows us to determine the optimal number of samples and arrays. Significance values for differential expression are obtained by a hierarchical bootstrapping scheme on scaled residuals. Four highly upregulated genes, including bfrB, were analyzed further. We observed an artefact, where transcriptional readthrough from these genes led to apparent upregulation of adjacent genes.
All methods and data discussed are available in the package YASMAhttp://www.cryst.bbk.ac.uk/wernisch/yasma.html for the statistical data analysis system R (http://www.R-project.org).
微阵列实验本质上存在噪声。尽管存在这种噪声,但重复实验是估计实际倍数变化的关键。在分析各种噪声源时,需要考虑重复实验的依赖结构。
我们分析了结核分枝杆菌trcS突变体的重复数据集,以鉴定差异表达基因,并提出用于过滤和标准化原始阵列数据以及估算缺失值的新方法。应用混合方差分析模型来量化各种误差源。这种分析还使我们能够确定样本和阵列的最佳数量。通过对缩放残差进行分层自举方案获得差异表达的显著性值。对包括bfrB在内的四个高度上调基因进行了进一步分析。我们观察到一种假象,即这些基因的转录通读导致相邻基因明显上调。
所讨论的所有方法和数据都可以在用于统计数据分析系统R(http://www.R-project.org)的YASMA软件包(http://www.cryst.bbk.ac.uk/wernisch/yasma.html)中获得。