用于检测微阵列中差异表达的霍特林T2多元分析

Hotelling's T2 multivariate profiling for detecting differential expression in microarrays.

作者信息

Lu Yan, Liu Peng-Yuan, Xiao Peng, Deng Hong-Wen

机构信息

Osteoporosis Research Center, Creighton University 601 N. 30th Street, Suite 6787, Omaha, NE 68131, USA.

出版信息

Bioinformatics. 2005 Jul 15;21(14):3105-13. doi: 10.1093/bioinformatics/bti496. Epub 2005 May 19.

DOI:10.1093/bioinformatics/bti496

PMID:15905280

Abstract

The most widely used statistical methods for finding differentially expressed genes (DEGs) are essentially univariate. In this study, we present a new T(2) statistic for analyzing microarray data. We implemented our method using a multiple forward search (MFS) algorithm that is designed for selecting a subset of feature vectors in high-dimensional microarray datasets. The proposed T2 statistic is a corollary to that originally developed for multivariate analyses and possesses two prominent statistical properties. First, our method takes into account multidimensional structure of microarray data. The utilization of the information hidden in gene interactions allows for finding genes whose differential expressions are not marginally detectable in univariate testing methods. Second, the statistic has a close relationship to discriminant analyses for classification of gene expression patterns. Our search algorithm sequentially maximizes gene expression difference/distance between two groups of genes. Including such a set of DEGs into initial feature variables may increase the power of classification rules. We validated our method by using a spike-in HGU95 dataset from Affymetrix. The utility of the new method was demonstrated by application to the analyses of gene expression patterns in human liver cancers and breast cancers. Extensive bioinformatics analyses and cross-validation of DEGs identified in the application datasets showed the significant advantages of our new algorithm.

摘要

用于寻找差异表达基因（DEG）的最广泛使用的统计方法本质上是单变量的。在本研究中，我们提出了一种用于分析微阵列数据的新T(2)统计量。我们使用一种多重前向搜索（MFS）算法来实现我们的方法，该算法旨在在高维微阵列数据集中选择特征向量的一个子集。所提出的T2统计量是最初为多变量分析开发的统计量的一个推论，并且具有两个突出的统计特性。首先，我们的方法考虑了微阵列数据的多维结构。利用隐藏在基因相互作用中的信息能够找到在单变量测试方法中无法从边缘检测到其差异表达的基因。其次，该统计量与用于基因表达模式分类的判别分析密切相关。我们的搜索算法依次最大化两组基因之间的基因表达差异/距离。将这样一组差异表达基因纳入初始特征变量可能会提高分类规则的功效。我们通过使用来自Affymetrix的一个掺入式HGU95数据集验证了我们的方法。通过将其应用于人类肝癌和乳腺癌的基因表达模式分析，证明了该新方法的实用性。对应用数据集中鉴定出的差异表达基因进行的广泛生物信息学分析和交叉验证显示了我们新算法的显著优势。

相似文献

Hotelling's T2 multivariate profiling for detecting differential expression in microarrays.用于检测微阵列中差异表达的霍特林T2多元分析

Bioinformatics. 2005 Jul 15;21(14):3105-13. doi: 10.1093/bioinformatics/bti496. Epub 2005 May 19.

Multivariate exploratory tools for microarray data analysis.用于微阵列数据分析的多变量探索工具。

Biostatistics. 2003 Oct;4(4):555-67. doi: 10.1093/biostatistics/4.4.555.

Exploiting sample variability to enhance multivariate analysis of microarray data.利用样本变异性增强微阵列数据的多变量分析。

Bioinformatics. 2007 Oct 15;23(20):2733-40. doi: 10.1093/bioinformatics/btm441. Epub 2007 Sep 7.

Increased power of microarray analysis by use of an algorithm based on a multivariate procedure.通过使用基于多变量程序的算法提高微阵列分析的效能。

Bioinformatics. 2005 Sep 1;21(17):3530-4. doi: 10.1093/bioinformatics/bti570. Epub 2005 Jul 5.

Microarray data analysis: a hierarchical T-test to handle heteroscedasticity.微阵列数据分析：一种用于处理异方差性的分层t检验。

Appl Bioinformatics. 2004;3(4):229-35.

Sample size for FDR-control in microarray data analysis.微阵列数据分析中用于错误发现率控制的样本量。

Bioinformatics. 2005 Jul 15;21(14):3097-104. doi: 10.1093/bioinformatics/bti456. Epub 2005 Apr 21.

Detecting differential gene expression with a semiparametric hierarchical mixture method.使用半参数分层混合方法检测差异基因表达。

Biostatistics. 2004 Apr;5(2):155-76. doi: 10.1093/biostatistics/5.2.155.

What should be expected from feature selection in small-sample settings.在小样本情况下，特征选择应达到什么预期效果。

Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.

Practical FDR-based sample size calculations in microarray experiments.微阵列实验中基于实际错误发现率的样本量计算

Bioinformatics. 2005 Aug 1;21(15):3264-72. doi: 10.1093/bioinformatics/bti519. Epub 2005 Jun 2.

Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms.使用 Lomb-Scargle 周期图检测非均匀间隔基因表达时间序列中的周期性模式。

Bioinformatics. 2006 Feb 1;22(3):310-6. doi: 10.1093/bioinformatics/bti789. Epub 2005 Nov 22.

引用本文的文献

A Network Toxicology Approach for Mechanistic Modelling of Nanomaterial Hazard and Adverse Outcomes.一种用于纳米材料危害和不良结局的机制建模的网络毒理学方法。

Adv Sci (Weinh). 2024 Aug;11(32):e2400389. doi: 10.1002/advs.202400389. Epub 2024 Jun 25.

A Rat Model of Clinically Relevant Extracorporeal Circulation Develops Early Organ Dysfunctions.一种与临床相关的体外循环大鼠模型会较早出现器官功能障碍。

Int J Mol Sci. 2023 Apr 16;24(8):7338. doi: 10.3390/ijms24087338.

MDEHT: a multivariate approach for detecting differential expression of microRNA isoform data in RNA-sequencing studies.MDEHT：一种用于检测 RNA 测序研究中小 miRNA 异构体数据差异表达的多元方法。

Bioinformatics. 2020 May 1;36(9):2657-2664. doi: 10.1093/bioinformatics/btaa015.

Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control.无模型特征筛选在分类结局中的应用：非线性效应检测和假发现率控制。

PLoS One. 2019 May 31;14(5):e0217463. doi: 10.1371/journal.pone.0217463. eCollection 2019.

Integrated genomic analysis of biological gene sets with applications in lung cancer prognosis.生物基因集的综合基因组分析及其在肺癌预后中的应用

BMC Bioinformatics. 2017 Jul 11;18(1):336. doi: 10.1186/s12859-017-1737-2.

Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data.用于高维基因组数据分析的多重组检验程序。

Genomics Inform. 2016 Dec;14(4):187-195. doi: 10.5808/GI.2016.14.4.187. Epub 2016 Dec 30.

A decision analysis model for KEGG pathway analysis.一种用于KEGG通路分析的决策分析模型。

BMC Bioinformatics. 2016 Oct 6;17(1):407. doi: 10.1186/s12859-016-1285-1.

On point estimation of the abnormality of a Mahalanobis index.关于马氏指数异常的点估计。

Comput Stat Data Anal. 2016 Jul;99:115-130. doi: 10.1016/j.csda.2016.01.014.

Post-transcriptional knowledge in pathway analysis increases the accuracy of phenotypes classification.通路分析中的转录后知识提高了表型分类的准确性。

Oncotarget. 2016 Aug 23;7(34):54572-54582. doi: 10.18632/oncotarget.9788.

Application of biclustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials.应用基因表达数据的双聚类和基因集富集分析方法来识别潜在致病纳米材料。

Beilstein J Nanotechnol. 2015 Dec 21;6:2438-48. doi: 10.3762/bjnano.6.252. eCollection 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于检测微阵列中差异表达的霍特林T2多元分析

Hotelling's T2 multivariate profiling for detecting differential expression in microarrays.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献