Manduchi E, Grant G R, McKenzie S E, Overton G C, Surrey S, Stoeckert C J
Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
Bioinformatics. 2000 Aug;16(8):685-98. doi: 10.1093/bioinformatics/16.8.685.
A protocol is described to attach expression patterns to genes represented in a collection of hybridization array experiments. Discrete values are used to provide an easily interpretable description of differential expression. Binning cutoffs for each sample type are chosen automatically, depending on the desired false-positive rate for the predictions of differential expression. Confidence levels are derived for the statement that changes in observed levels represent true changes in expression. We have a novel method for calculating this confidence, which gives better results than the standard methods. Our method reflects the broader change of focus in the field from studying a few genes with many replicates to studying many (possibly thousands) of genes simultaneously, but with relatively few replicates. Our approach differs from standard methods in that it exploits the fact that there are many genes on the arrays. These are used to estimate for each sample type an appropriate distribution that is employed to control the false-positive rate of the predictions made. Satisfactory results can be obtained using this method with as few as two replicates.
The method is illustrated through applications to macroarray and microarray datasets. The first is an erythroid development dataset that we have generated using nylon filter arrays. Clones for genes whose expression is known in these cells were assigned expression patterns which are in accordance with what was expected and which are not picked up by the standards methods. Moreover, genes differentially expressed between normal and leukemic cells were identified. These included genes whose expression was altered upon induction of the leukemic cells to differentiate. The second application is to the microarray data by Alizadeh et al. (2000). Our results are in accordance with their major findings and offer confidence measures for the predictions made. They also provide new insights for further analysis.
本文描述了一种将表达模式与杂交阵列实验集合中所代表的基因相关联的方案。使用离散值来提供对差异表达的易于解释的描述。根据差异表达预测所需的假阳性率,自动选择每种样本类型的分组截止值。对于观察到的水平变化代表表达的真实变化这一陈述,得出了置信水平。我们有一种计算这种置信度的新方法,其结果比标准方法更好。我们的方法反映了该领域关注点的更广泛变化,即从研究少数有许多重复的基因转向同时研究许多(可能数千个)基因,但重复次数相对较少。我们的方法与标准方法的不同之处在于,它利用了阵列上有许多基因这一事实。这些基因用于为每种样本类型估计一种合适的分布,该分布用于控制所做预测的假阳性率。使用这种方法,即使只有两个重复也能获得令人满意的结果。
通过将该方法应用于宏阵列和微阵列数据集来说明。第一个是我们使用尼龙滤膜阵列生成的红系发育数据集。在这些细胞中已知表达的基因的克隆被赋予了与预期相符且标准方法未检测到的表达模式。此外,还鉴定了正常细胞和白血病细胞之间差异表达的基因。这些基因包括白血病细胞诱导分化时表达发生改变的基因。第二个应用是针对Alizadeh等人(2000年)的微阵列数据。我们的结果与他们的主要发现一致,并为所做的预测提供了置信度度量。它们还为进一步分析提供了新的见解。