Qin Huaizhen, Feng Tao, Harding Scott A, Tsai Chung-Jui, Zhang Shuanglin
Department of Mathematical Sciences, Biotechnology Research Center, School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA.
Bioinformatics. 2008 Jul 15;24(14):1583-9. doi: 10.1093/bioinformatics/btn215. Epub 2008 May 3.
Microarray experiments typically analyze thousands to tens of thousands of genes from small numbers of biological replicates. The fact that genes are normally expressed in functionally relevant patterns suggests that gene-expression data can be stratified and clustered into relatively homogenous groups. Cluster-wise dimensionality reduction should make it feasible to improve screening power while minimizing information loss.
We propose a powerful and computationally simple method for finding differentially expressed genes in small microarray experiments. The method incorporates a novel stratification-based tight clustering algorithm, principal component analysis and information pooling. Comprehensive simulations show that our method is substantially more powerful than the popular SAM and eBayes approaches. We applied the method to three real microarray datasets: one from a Populus nitrogen stress experiment with 3 biological replicates; and two from public microarray datasets of human cancers with 10 to 40 biological replicates. In all three analyses, our method proved more robust than the popular alternatives for identification of differentially expressed genes.
The C++ code to implement the proposed method is available upon request for academic use.
微阵列实验通常从少量生物重复样本中分析数千到数万个基因。基因通常以功能相关模式表达这一事实表明,基因表达数据可以分层并聚类为相对同质的组。基于聚类的降维应该能够在最小化信息损失的同时提高筛选能力。
我们提出了一种强大且计算简单的方法,用于在小型微阵列实验中寻找差异表达基因。该方法结合了一种基于分层的紧密聚类新算法、主成分分析和信息合并。全面的模拟表明,我们的方法比流行的SAM和eBayes方法强大得多。我们将该方法应用于三个真实的微阵列数据集:一个来自杨树氮胁迫实验,有3个生物重复样本;另外两个来自人类癌症的公共微阵列数据集,有10到40个生物重复样本。在所有这三项分析中,我们的方法在识别差异表达基因方面比流行的替代方法更稳健。
如需学术使用,可根据请求提供实现所提方法的C++代码。