一种在微阵列实验中识别差异表达基因的有效方法。

An efficient method to identify differentially expressed genes in microarray experiments.

作者信息

Qin Huaizhen, Feng Tao, Harding Scott A, Tsai Chung-Jui, Zhang Shuanglin

机构信息

Department of Mathematical Sciences, Biotechnology Research Center, School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA.

出版信息

Bioinformatics. 2008 Jul 15;24(14):1583-9. doi: 10.1093/bioinformatics/btn215. Epub 2008 May 3.

DOI:10.1093/bioinformatics/btn215

PMID:18453554

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3607310/

Abstract

MOTIVATION

Microarray experiments typically analyze thousands to tens of thousands of genes from small numbers of biological replicates. The fact that genes are normally expressed in functionally relevant patterns suggests that gene-expression data can be stratified and clustered into relatively homogenous groups. Cluster-wise dimensionality reduction should make it feasible to improve screening power while minimizing information loss.

RESULTS

We propose a powerful and computationally simple method for finding differentially expressed genes in small microarray experiments. The method incorporates a novel stratification-based tight clustering algorithm, principal component analysis and information pooling. Comprehensive simulations show that our method is substantially more powerful than the popular SAM and eBayes approaches. We applied the method to three real microarray datasets: one from a Populus nitrogen stress experiment with 3 biological replicates; and two from public microarray datasets of human cancers with 10 to 40 biological replicates. In all three analyses, our method proved more robust than the popular alternatives for identification of differentially expressed genes.

AVAILABILITY

The C++ code to implement the proposed method is available upon request for academic use.

摘要

动机

微阵列实验通常从少量生物重复样本中分析数千到数万个基因。基因通常以功能相关模式表达这一事实表明，基因表达数据可以分层并聚类为相对同质的组。基于聚类的降维应该能够在最小化信息损失的同时提高筛选能力。

结果

我们提出了一种强大且计算简单的方法，用于在小型微阵列实验中寻找差异表达基因。该方法结合了一种基于分层的紧密聚类新算法、主成分分析和信息合并。全面的模拟表明，我们的方法比流行的SAM和eBayes方法强大得多。我们将该方法应用于三个真实的微阵列数据集：一个来自杨树氮胁迫实验，有3个生物重复样本；另外两个来自人类癌症的公共微阵列数据集，有10到40个生物重复样本。在所有这三项分析中，我们的方法在识别差异表达基因方面比流行的替代方法更稳健。

可用性

如需学术使用，可根据请求提供实现所提方法的C++代码。

相似文献

An efficient method to identify differentially expressed genes in microarray experiments.一种在微阵列实验中识别差异表达基因的有效方法。

Bioinformatics. 2008 Jul 15;24(14):1583-9. doi: 10.1093/bioinformatics/btn215. Epub 2008 May 3.

Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments.用于非循环短时间进程微阵列实验的基因发现和模式识别的二次回归分析。

BMC Bioinformatics. 2005 Apr 25;6:106. doi: 10.1186/1471-2105-6-106.

Inferential clustering approach for microarray experiments with replicated measurements.具有重复测量的微阵列实验的推断聚类方法。

IEEE/ACM Trans Comput Biol Bioinform. 2009 Oct-Dec;6(4):594-604. doi: 10.1109/TCBB.2008.106.

Visualization methods for statistical analysis of microarray clusters.用于微阵列簇统计分析的可视化方法。

BMC Bioinformatics. 2005 May 12;6:115. doi: 10.1186/1471-2105-6-115.

A unified framework for finding differentially expressed genes from microarray experiments.一种从微阵列实验中寻找差异表达基因的统一框架。

BMC Bioinformatics. 2007 Sep 18;8:347. doi: 10.1186/1471-2105-8-347.

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类

Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

Detecting clusters of different geometrical shapes in microarray gene expression data.在微阵列基因表达数据中检测不同几何形状的聚类。

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets.一种改进的超平面聚类算法能够对超大型数据集进行高效且准确的聚类。

Bioinformatics. 2009 May 1;25(9):1152-7. doi: 10.1093/bioinformatics/btp123. Epub 2009 Mar 4.

cluML: A markup language for clustering and cluster validity assessment of microarray data.cluML：一种用于微阵列数据聚类及聚类有效性评估的标记语言。

Appl Bioinformatics. 2005;4(3):211-3. doi: 10.2165/00822942-200504030-00006.

AMDA: an R package for the automated microarray data analysis.AMDA：一个用于自动微阵列数据分析的R软件包。

BMC Bioinformatics. 2006 Jul 6;7:335. doi: 10.1186/1471-2105-7-335.

引用本文的文献

Integrating mean and variance heterogeneities to identify differentially expressed genes.整合均值和方差异质性以识别差异表达基因。

BMC Bioinformatics. 2016 Dec 6;17(1):497. doi: 10.1186/s12859-016-1393-y.

Application of biclustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials.应用基因表达数据的双聚类和基因集富集分析方法来识别潜在致病纳米材料。

Beilstein J Nanotechnol. 2015 Dec 21;6:2438-48. doi: 10.3762/bjnano.6.252. eCollection 2015.

Identification of Yellow Pigmentation Genes in Brassica rapa ssp. pekinensis Using Br300 Microarray.利用Br300芯片鉴定白菜型油菜（Brassica rapa ssp. pekinensis）中的黄色素沉着基因

Int J Genomics. 2014;2014:204969. doi: 10.1155/2014/204969. Epub 2014 Dec 31.

Independent component analysis: mining microarray data for fundamental human gene expression modules.独立成分分析：从微阵列数据中挖掘基本的人类基因表达模块。

J Biomed Inform. 2010 Dec;43(6):932-44. doi: 10.1016/j.jbi.2010.07.001. Epub 2010 Jul 7.

本文引用的文献

Ratio-based decisions and the quantitative analysis of cDNA microarray images.基于比率的决策与cDNA微阵列图像的定量分析

J Biomed Opt. 1997 Oct;2(4):364-74. doi: 10.1117/12.281504.

A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance.对SAM、SAM R包的全面评估以及一项旨在提高其性能的简单修改。

BMC Bioinformatics. 2007 Jun 29;8:230. doi: 10.1186/1471-2105-8-230.

Assessing differential gene expression with small sample sizes in oligonucleotide arrays using a mean-variance model.使用均值-方差模型在寡核苷酸阵列中对小样本量进行差异基因表达评估。

Biometrics. 2007 Mar;63(1):41-9. doi: 10.1111/j.1541-0420.2006.00675.x.

Estimating p-values in small microarray experiments.在小型微阵列实验中估计p值。

Bioinformatics. 2007 Jan 1;23(1):38-43. doi: 10.1093/bioinformatics/btl548. Epub 2006 Oct 30.

Estimation of false discovery proportion under general dependence.一般相关性下错误发现比例的估计

Bioinformatics. 2006 Dec 15;22(24):3025-31. doi: 10.1093/bioinformatics/btl527. Epub 2006 Oct 17.

Regulation of gene expression in the mammalian eye and its relevance to eye disease.哺乳动物眼睛中基因表达的调控及其与眼病的相关性。

Proc Natl Acad Sci U S A. 2006 Sep 26;103(39):14429-34. doi: 10.1073/pnas.0602562103. Epub 2006 Sep 18.

Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus.杨树中调控防御性苯丙烷代谢的结构基因的全基因组分析。

New Phytol. 2006;172(1):47-62. doi: 10.1111/j.1469-8137.2006.01798.x.

What should be expected from feature selection in small-sample settings.在小样本情况下，特征选择应达到什么预期效果。

Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.

Linear models and empirical bayes methods for assessing differential expression in microarray experiments.用于评估微阵列实验中差异表达的线性模型和经验贝叶斯方法。

Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.

Integrating genotypic and expression data in a segregating mouse population to identify 5-lipoxygenase as a susceptibility gene for obesity and bone traits.整合分离小鼠群体中的基因型和表达数据，以确定5-脂氧合酶是肥胖和骨骼性状的易感基因。

Nat Genet. 2005 Nov;37(11):1224-33. doi: 10.1038/ng1619. Epub 2005 Oct 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验