用于检验基因表达模式与相关变量之间关联的距离矩阵的多元回归分析。

Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables.

作者信息

Zapala Matthew A, Schork Nicholas J

机构信息

Biomedical Sciences Graduate Program and the Polymorphism Research Laboratory, Department of Psychiatry, Moores UCSD Cancer Center, Center for Human Genetics and Genomics, University of California at San Diego, La Jolla, CA 92093, USA.

出版信息

Proc Natl Acad Sci U S A. 2006 Dec 19;103(51):19430-5. doi: 10.1073/pnas.0609333103. Epub 2006 Dec 4.

DOI:10.1073/pnas.0609333103

PMID:17146048

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1748243/

Abstract

A fundamental step in the analysis of gene expression and other high-dimensional genomic data is the calculation of the similarity or distance between pairs of individual samples in a study. If one has collected N total samples and assayed the expression level of G genes on those samples, then an N x N similarity matrix can be formed that reflects the correlation or similarity of the samples with respect to the expression values over the G genes. This matrix can then be examined for patterns via standard data reduction and cluster analysis techniques. We consider an alternative to conventional data reduction and cluster analyses of similarity matrices that is rooted in traditional linear models. This analysis method allows predictor variables collected on the samples to be related to variation in the pairwise similarity/distance values reflected in the matrix. The proposed multivariate method avoids the need for reducing the dimensions of a similarity matrix, can be used to assess relationships between the genes used to construct the matrix and additional information collected on the samples under study, and can be used to analyze individual genes or groups of genes identified in different ways. The technique can be used with any high-dimensional assay or data type and is ideally suited for testing subsets of genes defined by their participation in a biochemical pathway or other a priori grouping. We showcase the methodology using three published gene expression data sets.

摘要

基因表达及其他高维基因组数据分析的一个基本步骤是计算研究中各个样本对之间的相似度或距离。如果总共收集了N个样本，并检测了这些样本上G个基因的表达水平，那么就可以形成一个N×N的相似度矩阵，该矩阵反映了样本在G个基因的表达值方面的相关性或相似性。然后，可以通过标准的数据降维和聚类分析技术来检查这个矩阵中的模式。我们考虑一种替代传统相似度矩阵数据降维和聚类分析的方法，它基于传统的线性模型。这种分析方法允许样本上收集的预测变量与矩阵中反映的成对相似度/距离值的变化相关。所提出的多变量方法避免了对相似度矩阵进行降维的需要，可用于评估用于构建矩阵的基因与所研究样本上收集的其他信息之间的关系，并且可用于分析以不同方式识别的单个基因或基因组。该技术可用于任何高维检测或数据类型，非常适合测试由其参与生化途径或其他先验分组定义的基因子集。我们使用三个已发表的基因表达数据集展示了该方法。

相似文献

Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables.

Proc Natl Acad Sci U S A. 2006 Dec 19;103(51):19430-5. doi: 10.1073/pnas.0609333103. Epub 2006 Dec 4.

Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.

BJU Int. 2011 Jul;108(2 Pt 2):E29-35. doi: 10.1111/j.1464-410X.2010.09794.x. Epub 2011 Mar 16.

Representative distance: a new similarity measure for class discovery from gene expression data.

IEEE Trans Nanobioscience. 2012 Dec;11(4):341-51. doi: 10.1109/TNB.2012.2208198. Epub 2012 Aug 6.

Sorting points into neighborhoods (SPIN): data analysis and visualization by ordering distance matrices.

Bioinformatics. 2005 May 15;21(10):2301-8. doi: 10.1093/bioinformatics/bti329. Epub 2005 Feb 18.

Supervised distance matrices.

Stat Appl Genet Mol Biol. 2008;7(1):Article 33. doi: 10.2202/1544-6115.1404. Epub 2008 Nov 11.

Hotelling's T2 multivariate profiling for detecting differential expression in microarrays.

Bioinformatics. 2005 Jul 15;21(14):3105-13. doi: 10.1093/bioinformatics/bti496. Epub 2005 May 19.

Exploiting sample variability to enhance multivariate analysis of microarray data.

Bioinformatics. 2007 Oct 15;23(20):2733-40. doi: 10.1093/bioinformatics/btm441. Epub 2007 Sep 7.

Nonparametric pathway-based regression models for analysis of genomic data.

Biostatistics. 2007 Apr;8(2):265-84. doi: 10.1093/biostatistics/kxl007. Epub 2006 Jun 13.

Statistical properties of multivariate distance matrix regression for high-dimensional data analysis.

Front Genet. 2012 Sep 27;3:190. doi: 10.3389/fgene.2012.00190. eCollection 2012.

Sci Rep. 2021 Nov 3;11(1):21609. doi: 10.1038/s41598-021-00678-9.

引用本文的文献

Short-Term Probiotic Colonization Alters Molecular Dynamics of 3D Oral Biofilms.

Int J Mol Sci. 2025 Jul 3;26(13):6403. doi: 10.3390/ijms26136403.

Identification and Evaluation of the Urinary Microbiota Associated With Bladder Cancer.

Cancer Innov. 2025 May 25;4(4):e70012. doi: 10.1002/cai2.70012. eCollection 2025 Aug.

Assessment of Soil Health Through Metagenomic Analysis of Bacterial Diversity in Russian Black Soil.

Microorganisms. 2025 Apr 9;13(4):854. doi: 10.3390/microorganisms13040854.

Gut microbiota signatures of the three Mexican primate species, including hybrid populations.

PLoS One. 2025 Mar 18;20(3):e0317657. doi: 10.1371/journal.pone.0317657. eCollection 2025.

Pre-exposure of abundant species to disturbance improves resilience in microbial metacommunities.

Nat Ecol Evol. 2025 Mar;9(3):395-405. doi: 10.1038/s41559-024-02624-0. Epub 2025 Jan 17.

Soil bacterial and fungal diversity and composition respond differently to desertified system restoration.

PLoS One. 2025 Jan 6;20(1):e0309188. doi: 10.1371/journal.pone.0309188. eCollection 2025.

Patterns of antibiotic resistance genes and virulence factor genes in the gut microbiome of patients with osteoarthritis and rheumatoid arthritis.

Front Microbiol. 2024 Nov 20;15:1427313. doi: 10.3389/fmicb.2024.1427313. eCollection 2024.

Environmental Factors Drive the Biogeographic Pattern of Root Endophytic Fungal Diversity in the Arid Regions of Northwest China.

J Fungi (Basel). 2024 Sep 29;10(10):679. doi: 10.3390/jof10100679.

The aberrant tonsillar microbiota modulates autoimmune responses in rheumatoid arthritis.

JCI Insight. 2024 Aug 20;9(18):e175916. doi: 10.1172/jci.insight.175916.

Quantitative microbiome profiling reveals the developmental trajectory of the chicken gut microbiota and its connection to host metabolism.

Imeta. 2023 Apr 25;2(2):e105. doi: 10.1002/imt2.105. eCollection 2023 May.

本文引用的文献

A factor analysis model for functional genomics.

BMC Bioinformatics. 2006 Apr 21;7:216. doi: 10.1186/1471-2105-7-216.

Distance-based tests for homogeneity of multivariate dispersions.

Biometrics. 2006 Mar;62(1):245-53. doi: 10.1111/j.1541-0420.2005.00440.x.

Cyclin-dependent kinase 5, Munc18a and Munc18-interacting protein 1/X11alpha protein up-regulation in Alzheimer's disease.

Neuroscience. 2006;138(2):511-22. doi: 10.1016/j.neuroscience.2005.11.017. Epub 2006 Jan 18.

Microarray data analysis: from disarray to consolidation and consensus.

Nat Rev Genet. 2006 Jan;7(1):55-65. doi: 10.1038/nrg1749.

How does gene expression clustering work?

Nat Biotechnol. 2005 Dec;23(12):1499-501. doi: 10.1038/nbt1205-1499.

Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30.

Phylogenetic trees: visualizing, customizing and detecting incongruence.

Bioinformatics. 2005 Oct 1;21(19):3801-2. doi: 10.1093/bioinformatics/bti590. Epub 2005 Jul 19.

Adult mouse brain gene expression patterns bear an embryologic imprint.

Proc Natl Acad Sci U S A. 2005 Jul 19;102(29):10357-62. doi: 10.1073/pnas.0503357102. Epub 2005 Jul 7.

Association of cyclin-dependent kinase 5 and neuronal activators p35 and p39 complex in early-onset Alzheimer's disease.

Neurobiol Aging. 2005 Aug-Sep;26(8):1145-51. doi: 10.1016/j.neurobiolaging.2004.10.003. Epub 2004 Dec 22.

Molecular Property eXplorer: a novel approach to visualizing SAR using tree-maps and heatmaps.

J Chem Inf Model. 2005 Mar-Apr;45(2):523-32. doi: 10.1021/ci0496954.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于检验基因表达模式与相关变量之间关联的距离矩阵的多元回归分析。

Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables.

作者信息

Zapala Matthew A, Schork Nicholas J

机构信息

出版信息

Proc Natl Acad Sci U S A. 2006 Dec 19;103(51):19430-5. doi: 10.1073/pnas.0609333103. Epub 2006 Dec 4.

DOI:10.1073/pnas.0609333103

PMID:17146048

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1748243/

Abstract

摘要

用于检验基因表达模式与相关变量之间关联的距离矩阵的多元回归分析。

Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

用于检验基因表达模式与相关变量之间关联的距离矩阵的多元回归分析。

Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables.

作者信息

机构信息

出版信息