一种从微阵列实验中寻找差异表达基因的统一框架。

A unified framework for finding differentially expressed genes from microarray experiments.

作者信息

Shaik Jahangheer S, Yeasin Mohammed

机构信息

Department of Electrical and Computer Engineering, CVPIA Lab, University of Memphis, Memphis, TN-38152, USA.

出版信息

BMC Bioinformatics. 2007 Sep 18;8:347. doi: 10.1186/1471-2105-8-347.

DOI:10.1186/1471-2105-8-347

PMID:17877806

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2099446/

Abstract

BACKGROUND

This paper presents a unified framework for finding differentially expressed genes (DEGs) from the microarray data. The proposed framework has three interrelated modules: (i) gene ranking, ii) significance analysis of genes and (iii) validation. The first module uses two gene selection algorithms, namely, a) two-way clustering and b) combined adaptive ranking to rank the genes. The second module converts the gene ranks into p-values using an R-test and fuses the two sets of p-values using the Fisher's omnibus criterion. The DEGs are selected using the FDR analysis. The third module performs three fold validations of the obtained DEGs. The robustness of the proposed unified framework in gene selection is first illustrated using false discovery rate analysis. In addition, the clustering-based validation of the DEGs is performed by employing an adaptive subspace-based clustering algorithm on the training and the test datasets. Finally, a projection-based visualization is performed to validate the DEGs obtained using the unified framework.

RESULTS

The performance of the unified framework is compared with well-known ranking algorithms such as t-statistics, Significance Analysis of Microarrays (SAM), Adaptive Ranking, Combined Adaptive Ranking and Two-way Clustering. The performance curves obtained using 50 simulated microarray datasets each following two different distributions indicate the superiority of the unified framework over the other reported algorithms. Further analyses on 3 real cancer datasets and 3 Parkinson's datasets show the similar improvement in performance. First, a 3 fold validation process is provided for the two-sample cancer datasets. In addition, the analysis on 3 sets of Parkinson's data is performed to demonstrate the scalability of the proposed method to multi-sample microarray datasets.

CONCLUSION

This paper presents a unified framework for the robust selection of genes from the two-sample as well as multi-sample microarray experiments. Two different ranking methods used in module 1 bring diversity in the selection of genes. The conversion of ranks to p-values, the fusion of p-values and FDR analysis aid in the identification of significant genes which cannot be judged based on gene ranking alone. The 3 fold validation, namely, robustness in selection of genes using FDR analysis, clustering, and visualization demonstrate the relevance of the DEGs. Empirical analyses on 50 artificial datasets and 6 real microarray datasets illustrate the efficacy of the proposed approach. The analyses on 3 cancer datasets demonstrate the utility of the proposed approach on microarray datasets with two classes of samples. The scalability of the proposed unified approach to multi-sample (more than two sample classes) microarray datasets is addressed using three sets of Parkinson's Data. Empirical analyses show that the unified framework outperformed other gene selection methods in selecting differentially expressed genes from microarray data.

摘要

背景

本文提出了一个从微阵列数据中寻找差异表达基因（DEG）的统一框架。所提出的框架有三个相互关联的模块：（i）基因排序，（ii）基因的显著性分析，以及（iii）验证。第一个模块使用两种基因选择算法，即a）双向聚类和b）组合自适应排序来对基因进行排序。第二个模块使用R检验将基因排名转换为p值，并使用Fisher综合准则融合两组p值。通过FDR分析选择DEG。第三个模块对获得的DEG进行三重验证。首先使用错误发现率分析来说明所提出的统一框架在基因选择中的稳健性。此外，通过在训练和测试数据集上采用基于自适应子空间的聚类算法对DEG进行基于聚类的验证。最后，进行基于投影的可视化以验证使用统一框架获得的DEG。

结果

将统一框架的性能与t统计量、微阵列显著性分析（SAM）、自适应排序、组合自适应排序和双向聚类等著名排序算法进行了比较。使用50个分别遵循两种不同分布的模拟微阵列数据集获得的性能曲线表明，统一框架优于其他已报道的算法。对3个真实癌症数据集和3个帕金森病数据集的进一步分析显示了类似的性能提升。首先，为双样本癌症数据集提供了一个三重验证过程。此外，对3组帕金森病数据进行了分析，以证明所提出方法对多样本微阵列数据集的可扩展性。

结论

本文提出了一个用于从双样本以及多样本微阵列实验中稳健选择基因的统一框架。模块1中使用的两种不同排序方法在基因选择上带来了多样性。将排名转换为p值、p值融合和FDR分析有助于识别仅基于基因排名无法判断的显著基因。三重验证，即使用FDR分析在基因选择中的稳健性、聚类和可视化证明了DEG的相关性。对50个人工数据集和6个真实微阵列数据集的实证分析说明了所提出方法的有效性。对3个癌症数据集的分析证明了所提出方法在具有两类样本的微阵列数据集上的实用性。使用三组帕金森病数据解决了所提出的统一方法对多样本（超过两个样本类）微阵列数据集的可扩展性。实证分析表明，在从微阵列数据中选择差异表达基因方面，统一框架优于其他基因选择方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0bc/2099446/30e1c52fc3f0/1471-2105-8-347-1.jpg

相似文献

A unified framework for finding differentially expressed genes from microarray experiments.一种从微阵列实验中寻找差异表达基因的统一框架。

BMC Bioinformatics. 2007 Sep 18;8:347. doi: 10.1186/1471-2105-8-347.

Fuzzy-adaptive-subspace-iteration-based two-way clustering of microarray data.基于模糊自适应子空间迭代的微阵列数据双向聚类

IEEE/ACM Trans Comput Biol Bioinform. 2009 Apr-Jun;6(2):244-59. doi: 10.1109/TCBB.2008.15.

Effect of data normalization on fuzzy clustering of DNA microarray data.数据归一化对DNA微阵列数据模糊聚类的影响。

BMC Bioinformatics. 2006 Mar 14;7:134. doi: 10.1186/1471-2105-7-134.

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.使用微阵列基因表达数据的用于疾病分类的核嵌入高斯过程。

BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.使用功能类别参考集评估基因表达数据聚类算法的方法。

BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.

Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

A stable gene selection in microarray data analysis.微阵列数据分析中的稳定基因选择。

BMC Bioinformatics. 2006 Apr 27;7:228. doi: 10.1186/1471-2105-7-228.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合：一种蒙特卡洛交叉熵方法。

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

Mixture-model based estimation of gene expression variance from public database improves identification of differentially expressed genes in small sized microarray data.基于混合模型的公共数据库中基因表达方差估计可提高对小样本微阵列数据中差异表达基因的识别。

Bioinformatics. 2010 Feb 15;26(4):486-92. doi: 10.1093/bioinformatics/btp685. Epub 2009 Dec 16.

The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies.微阵列研究中差异表达基因列表的可重复性、敏感性和特异性之间的平衡。

BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S10. doi: 10.1186/1471-2105-9-S9-S10.

引用本文的文献

A two-way rectification method for identifying differentially expressed genes by maximizing the co-function relationship.双向整流法通过最大化共功能关系来识别差异表达基因。

BMC Genomics. 2021 Jun 25;22(Suppl 1):471. doi: 10.1186/s12864-021-07772-2.

Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps.通过 MG-PCC 算法、t-SNE-SS 和 t-SNE-SG 图谱分析样本和基因的相似性。

BMC Bioinformatics. 2018 Dec 17;19(1):512. doi: 10.1186/s12859-018-2495-5.

A model for aryl hydrocarbon receptor-activated gene expression shows potency and efficacy changes and predicts squelching due to competition for transcription co-activators.芳烃受体激活基因表达模型显示效力和功效变化，并预测由于转录共激活因子的竞争而导致的基因沉默。

PLoS One. 2015 Jun 3;10(6):e0127952. doi: 10.1371/journal.pone.0127952. eCollection 2015.

Clustering of High Throughput Gene Expression Data.高通量基因表达数据的聚类

Comput Oper Res. 2012 Dec;39(12):3046-3061. doi: 10.1016/j.cor.2012.03.008.

Perspectives for Metabolomics in Human Nutrition: An Overview.人类营养代谢组学的前景：概述

Nutr Bull. 2008 Dec;33(4):324-330. doi: 10.1111/j.1467-3010.2008.00733.x.

Density based pruning for identification of differentially expressed genes from microarray data.基于密度的剪枝方法从基因表达微阵列数据中识别差异表达基因。

BMC Genomics. 2010 Nov 2;11 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2164-11-S2-S3.

MIClique: An algorithm to identify differentially coexpressed disease gene subset from microarray data.MIClique：一种从微阵列数据中识别差异共表达疾病基因子集的算法。

J Biomed Biotechnol. 2009;2009:642524. doi: 10.1155/2009/642524. Epub 2010 Jan 20.

Deconvoluting the 'omics' for organ transplantation.解析器官移植中的“组学”。

Curr Opin Organ Transplant. 2009 Oct;14(5):544-51. doi: 10.1097/MOT.0b013e32833068fb.

The proteogenomic path towards biomarker discovery.蛋白质基因组学在生物标志物发现中的应用路径。

Pediatr Transplant. 2008 Nov;12(7):737-47. doi: 10.1111/j.1399-3046.2008.01018.x. Epub 2008 Aug 22.

本文引用的文献

A cluster separation measure.一种聚类分离度量。

IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):224-7.

Significance of gene ranking for classification of microarray samples.基因排序在微阵列样本分类中的意义。

IEEE/ACM Trans Comput Biol Bioinform. 2006 Jul-Sep;3(3):312-20. doi: 10.1109/TCBB.2006.42.

Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data.从微阵列数据生成差异表达基因列表的方法的比较与评估

BMC Bioinformatics. 2006 Jul 26;7:359. doi: 10.1186/1471-2105-7-359.

Error distribution for gene expression data.基因表达数据的误差分布。

Stat Appl Genet Mol Biol. 2005;4:Article16. doi: 10.2202/1544-6115.1070. Epub 2005 Jul 12.

Data-adaptive test statistics for microarray data.用于微阵列数据的数据自适应检验统计量。

Bioinformatics. 2005 Sep 1;21 Suppl 2:ii108-14. doi: 10.1093/bioinformatics/bti1119.

Dysregulation of gene expression in the 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine-lesioned mouse substantia nigra.1-甲基-4-苯基-1,2,3,6-四氢吡啶损伤的小鼠黑质中基因表达的失调

J Neurosci. 2004 Aug 25;24(34):7445-54. doi: 10.1523/JNEUROSCI.4204-03.2004.

Controlling the proportion of false positives in multiple dependent tests.控制多个相关检验中假阳性的比例。

Genetics. 2004 Jan;166(1):611-9. doi: 10.1534/genetics.166.1.611.

Variation in gene expression patterns in human gastric cancers.人类胃癌中基因表达模式的变化。

Mol Biol Cell. 2003 Aug;14(8):3208-15. doi: 10.1091/mbc.e02-12-0833. Epub 2003 Apr 17.

RankGene: identification of diagnostic genes based on expression data.RankGene：基于表达数据的诊断基因鉴定

Bioinformatics. 2003 Aug 12;19(12):1578-9. doi: 10.1093/bioinformatics/btg179.

Statistical significance for genomewide studies.全基因组研究的统计学显著性

Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5. doi: 10.1073/pnas.1530509100. Epub 2003 Jul 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种从微阵列实验中寻找差异表达基因的统一框架。

A unified framework for finding differentially expressed genes from microarray experiments.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献