Yu Hui, Tu Kang, Xie Lu, Li Yuan-Yuan
Bioinformatics Center, Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P R China.
J Bioinform Comput Biol. 2010 Dec;8 Suppl 1:161-75. doi: 10.1142/s0219720010005208.
With regards to well-replicated two-conditional microarray datasets, the selection of differentially expressed (DE) genes is a well-studied computational topic, but for multi-conditional microarray datasets with limited or no replication, the same task is not properly addressed by previous studies. This paper adopts multivariate outlier analysis to analyze replication-lacking multi-conditional microarray datasets, finding that it performs significantly better than the widely used limit fold change (LFC) model in a simulated comparative experiment. Compared with the LFC model, the multivariate outlier analysis also demonstrates improved stability against sample variations in a series of manipulated real expression datasets. The reanalysis of a real non-replicated multi-conditional expression dataset series leads to satisfactory results. In conclusion, a multivariate outlier analysis algorithm, like DigOut, is particularly useful for selecting DE genes from non-replicated multi-conditional gene expression dataset.
对于经过充分重复验证的双条件微阵列数据集,差异表达(DE)基因的选择是一个研究充分的计算主题,但对于重复有限或无重复的多条件微阵列数据集,以往的研究并未妥善解决同样的任务。本文采用多变量离群值分析来分析缺乏重复的多条件微阵列数据集,发现在模拟比较实验中,它的表现明显优于广泛使用的极限倍数变化(LFC)模型。与LFC模型相比,多变量离群值分析在一系列经过处理的真实表达数据集中,对样本变异也表现出更高的稳定性。对一个真实的非重复多条件表达数据集系列进行重新分析,得到了令人满意的结果。总之,像DigOut这样的多变量离群值分析算法对于从非重复的多条件基因表达数据集中选择DE基因特别有用。