一种整合全基因组表达数据与生物学知识的多变量方法。

A multivariate approach for integrating genome-wide expression data and biological knowledge.

作者信息

Kong Sek Won, Pu William T, Park Peter J

机构信息

Department of Cardiology 300 Longwood Avenue, Boston, MA 02115, USA.

出版信息

Bioinformatics. 2006 Oct 1;22(19):2373-80. doi: 10.1093/bioinformatics/btl401. Epub 2006 Jul 28.

DOI:10.1093/bioinformatics/btl401

PMID:16877751

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2813864/

Abstract

MOTIVATION

Several statistical methods that combine analysis of differential gene expression with biological knowledge databases have been proposed for a more rapid interpretation of expression data. However, most such methods are based on a series of univariate statistical tests and do not properly account for the complex structure of gene interactions.

RESULTS

We present a simple yet effective multivariate statistical procedure for assessing the correlation between a subspace defined by a group of genes and a binary phenotype. A subspace is deemed significant if the samples corresponding to different phenotypes are well separated in that subspace. The separation is measured using Hotelling's T(2) statistic, which captures the covariance structure of the subspace. When the dimension of the subspace is larger than that of the sample space, we project the original data to a smaller orthonormal subspace. We use this method to search through functional pathway subspaces defined by Reactome, KEGG, BioCarta and Gene Ontology. To demonstrate its performance, we apply this method to the data from two published studies, and visualize the results in the principal component space.

摘要

动机

已经提出了几种将差异基因表达分析与生物知识数据库相结合的统计方法，以便更快速地解释表达数据。然而，大多数此类方法基于一系列单变量统计检验，并未充分考虑基因相互作用的复杂结构。

结果

我们提出了一种简单而有效的多变量统计程序，用于评估由一组基因定义的子空间与二元表型之间的相关性。如果对应于不同表型的样本在该子空间中能够很好地分离，则该子空间被认为是显著的。使用Hotelling's T(2)统计量来衡量分离程度，该统计量捕获了子空间的协方差结构。当子空间的维度大于样本空间的维度时，我们将原始数据投影到一个较小的正交子空间。我们使用这种方法在由Reactome、KEGG、BioCarta和基因本体定义的功能通路子空间中进行搜索。为了证明其性能，我们将此方法应用于两项已发表研究的数据，并在主成分空间中可视化结果。

相似文献

A multivariate approach for integrating genome-wide expression data and biological knowledge.一种整合全基因组表达数据与生物学知识的多变量方法。

Bioinformatics. 2006 Oct 1;22(19):2373-80. doi: 10.1093/bioinformatics/btl401. Epub 2006 Jul 28.

A factor analysis model for functional genomics.一种用于功能基因组学的因子分析模型。

BMC Bioinformatics. 2006 Apr 21;7:216. doi: 10.1186/1471-2105-7-216.

GO-Diff: mining functional differentiation between EST-based transcriptomes.GO-Diff：挖掘基于EST的转录组之间的功能差异

BMC Bioinformatics. 2006 Feb 16;7:72. doi: 10.1186/1471-2105-7-72.

Hotelling's T2 multivariate profiling for detecting differential expression in microarrays.用于检测微阵列中差异表达的霍特林T2多元分析

Bioinformatics. 2005 Jul 15;21(14):3105-13. doi: 10.1093/bioinformatics/bti496. Epub 2005 May 19.

Eu.Gene Analyzer a tool for integrating gene expression data with pathway databases.Eu.Gene分析器：一种用于将基因表达数据与通路数据库整合的工具。

Bioinformatics. 2007 Oct 1;23(19):2631-2. doi: 10.1093/bioinformatics/btm333. Epub 2007 Jun 28.

eQTL Explorer: integrated mining of combined genetic linkage and expression experiments.eQTL资源管理器：联合遗传连锁与表达实验的综合挖掘

Bioinformatics. 2006 Feb 15;22(4):509-11. doi: 10.1093/bioinformatics/btk007. Epub 2005 Dec 15.

Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach.通过结合文献挖掘和微阵列分析构建生物网络：一种LMMA方法。

Bioinformatics. 2006 Sep 1;22(17):2143-50. doi: 10.1093/bioinformatics/btl363. Epub 2006 Jul 4.

Putting microarrays in a context: integrated analysis of diverse biological data.将微阵列置于特定情境中：对多种生物数据的综合分析。

Brief Bioinform. 2005 Mar;6(1):34-43. doi: 10.1093/bib/6.1.34.

Genome Expression Pathway Analysis Tool--analysis and visualization of microarray gene expression data under genomic, proteomic and metabolic context.基因组表达通路分析工具——在基因组、蛋白质组和代谢背景下对微阵列基因表达数据进行分析和可视化。

BMC Bioinformatics. 2007 Jun 2;8:179. doi: 10.1186/1471-2105-8-179.

Integrating whole-genome expression results into metabolic networks with Pathway Processor.使用通路处理器将全基因组表达结果整合到代谢网络中。

Curr Protoc Bioinformatics. 2004 May;Chapter 7:Unit 7.6. doi: 10.1002/0471250953.bi0706s05.

引用本文的文献

Phoenics: a novel statistical approach for longitudinal metabolomic pathway analysis.Phoenics：一种用于纵向代谢组学通路分析的新型统计方法。

BMC Bioinformatics. 2025 Apr 16;26(1):105. doi: 10.1186/s12859-025-06118-z.

Geographically weighted linear combination test for gene-set analysis of a continuous spatial phenotype as applied to intratumor heterogeneity.应用于肿瘤内异质性的连续空间表型基因集分析的地理加权线性组合检验

Front Cell Dev Biol. 2023 Mar 9;11:1065586. doi: 10.3389/fcell.2023.1065586. eCollection 2023.

CPA: a web-based platform for consensus pathway analysis and interactive visualization.CPA：一个基于网络的共识途径分析和交互式可视化平台。

Nucleic Acids Res. 2021 Jul 2;49(W1):W114-W124. doi: 10.1093/nar/gkab421.

Rational drug design, synthesis, and biological evaluation of novel chiral tetrahydronaphthalene-fused spirooxindole as MDM2-CDK4 dual inhibitor against glioblastoma.新型手性四氢萘并螺吲哚作为MDM2-CDK4双重抑制剂抗胶质母细胞瘤的合理药物设计、合成及生物学评价

Acta Pharm Sin B. 2020 Aug;10(8):1492-1510. doi: 10.1016/j.apsb.2019.12.013. Epub 2019 Dec 27.

Gene Set Analysis: Challenges, Opportunities, and Future Research.基因集分析：挑战、机遇与未来研究

Front Genet. 2020 Jun 30;11:654. doi: 10.3389/fgene.2020.00654. eCollection 2020.

Predictive modelling using pathway scores: robustness and significance of pathway collections.基于通路评分的预测模型：通路集合的稳健性和显著性。

BMC Bioinformatics. 2019 Nov 4;20(1):543. doi: 10.1186/s12859-019-3163-0.

Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data.将遗传网络纳入具有高维 DNA 甲基化数据的病例对照关联研究中。

BMC Bioinformatics. 2019 Oct 22;20(1):510. doi: 10.1186/s12859-019-3040-x.

Identifying significantly impacted pathways: a comprehensive review and assessment.识别受显著影响的途径：全面回顾与评估。

Genome Biol. 2019 Oct 9;20(1):203. doi: 10.1186/s13059-019-1790-4.

SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics.SCIA：一种适用于具有不同特征数据的新型基因集分析方法。

Front Genet. 2019 Jun 25;10:598. doi: 10.3389/fgene.2019.00598. eCollection 2019.

Logic programming reveals alteration of key transcription factors in multiple myeloma.逻辑编程揭示多发性骨髓瘤中关键转录因子的改变。

Sci Rep. 2017 Aug 23;7(1):9257. doi: 10.1038/s41598-017-09378-9.

本文引用的文献

Improved scoring of functional groups from gene expression data by decorrelating GO graph structure.通过去相关GO图结构从基因表达数据中改进功能组的评分。

Bioinformatics. 2006 Jul 1;22(13):1600-7. doi: 10.1093/bioinformatics/btl140. Epub 2006 Apr 10.

Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.基因集富集分析：一种基于知识的方法用于解读全基因组表达谱。

Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30.

Discovering statistically significant pathways in expression profiling studies.在基因表达谱研究中发现具有统计学意义的通路。

Proc Natl Acad Sci U S A. 2005 Sep 20;102(38):13544-9. doi: 10.1073/pnas.0506577102. Epub 2005 Sep 8.

Testing differential gene expression in functional groups. Goeman's global test versus an ANCOVA approach.测试功能组中的差异基因表达。戈曼全局检验与协方差分析方法的比较。

Methods Inf Med. 2005;44(3):449-53.

Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays.阈值选择对通过DNA微阵列分析基因表达时得出的生物学结论的影响。

Proc Natl Acad Sci U S A. 2005 Jun 21;102(25):8961-5. doi: 10.1073/pnas.0502674102. Epub 2005 Jun 10.

Hotelling's T2 multivariate profiling for detecting differential expression in microarrays.用于检测微阵列中差异表达的霍特林T2多元分析

Bioinformatics. 2005 Jul 15;21(14):3105-13. doi: 10.1093/bioinformatics/bti496. Epub 2005 May 19.

Reactome: a knowledgebase of biological pathways.Reactome：生物通路知识库。

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D428-32. doi: 10.1093/nar/gki072.

A module map showing conditional activity of expression modules in cancer.一张显示癌症中表达模块条件活性的模块图。

Nat Genet. 2004 Oct;36(10):1090-8. doi: 10.1038/ng1434. Epub 2004 Sep 26.

Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer.将微阵列数据转化为结直肠癌临床相关诊断信息的统计方法。

Bioinformatics. 2005 Feb 15;21(4):517-28. doi: 10.1093/bioinformatics/bti029. Epub 2004 Sep 16.

Gene-expression patterns in drug-resistant acute lymphoblastic leukemia cells and response to treatment.耐药急性淋巴细胞白血病细胞中的基因表达模式及治疗反应。

N Engl J Med. 2004 Aug 5;351(6):533-42. doi: 10.1056/NEJMoa033513.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。