用于从复制的全基因组数据推断功能关系的多变量相关估计器。

Multivariate correlation estimator for inferring functional relationships from replicated genome-wide data.

作者信息

Zhu Dongxiao, Li Youjuan, Li Hua

机构信息

Stowers Institute for Medical Research, 1000 E 50th Street, Kansas City, MO 64110, USA.

出版信息

Bioinformatics. 2007 Sep 1;23(17):2298-305. doi: 10.1093/bioinformatics/btm328. Epub 2007 Jun 22.

DOI:10.1093/bioinformatics/btm328

PMID:17586543

Abstract

UNLABELLED

Estimating pairwise correlation from replicated genome-scale (a.k.a. OMICS) data is fundamental to cluster functionally relevant biomolecules to a cellular pathway. The popular Pearson correlation coefficient estimates bivariate correlation by averaging over replicates. It is not completely satisfactory since it introduces strong bias while reducing variance. We propose a new multivariate correlation estimator that models all replicates as independent and identically distributed (i.i.d.) samples from the multivariate normal distribution. We derive the estimator by maximizing the likelihood function. For small sample data, we provide a resampling-based statistical inference procedure, and for moderate to large sample data, we provide an asymptotic statistical inference procedure based on the Likelihood Ratio Test (LRT). We demonstrate advantages of the new multivariate correlation estimator over Pearson bivariate correlation estimator using simulations and real-world data analysis examples.

AVAILABILITY

The estimator and statistical inference procedures have been implemented in an R package 'CORREP' that is available from CRAN [http://cran.r-project.org] and Bioconductor [http://www.bioconductor.org/].

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

未标注

从重复的基因组规模（即组学）数据中估计成对相关性，对于将功能相关的生物分子聚类到细胞通路至关重要。流行的皮尔逊相关系数通过对重复样本求平均来估计二元相关性。它并不完全令人满意，因为它在降低方差的同时引入了强烈的偏差。我们提出了一种新的多变量相关性估计器，该估计器将所有重复样本建模为来自多元正态分布的独立同分布（i.i.d.）样本。我们通过最大化似然函数来推导该估计器。对于小样本数据，我们提供基于重采样的统计推断程序，对于中等到大样本数据，我们提供基于似然比检验（LRT）的渐近统计推断程序。我们通过模拟和实际数据分析示例展示了新的多变量相关性估计器相对于皮尔逊二元相关性估计器的优势。

可用性

该估计器和统计推断程序已在R包“CORREP”中实现，可从CRAN [http://cran.r-project.org] 和Bioconductor [http://www.bioconductor.org/] 获得。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

Multivariate correlation estimator for inferring functional relationships from replicated genome-wide data.

Bioinformatics. 2007 Sep 1;23(17):2298-305. doi: 10.1093/bioinformatics/btm328. Epub 2007 Jun 22.

Parallelized prediction error estimation for evaluation of high-dimensional models.

Bioinformatics. 2009 Mar 15;25(6):827-9. doi: 10.1093/bioinformatics/btp062. Epub 2009 Jan 28.

Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis.

Bioinformatics. 2005 Sep 15;21(18):3683-5. doi: 10.1093/bioinformatics/bti605. Epub 2005 Aug 2.

An improved method for bivariate meta-analysis when within-study correlations are unknown.

Res Synth Methods. 2018 Mar;9(1):73-88. doi: 10.1002/jrsm.1274. Epub 2017 Dec 7.

Exploiting sample variability to enhance multivariate analysis of microarray data.

Bioinformatics. 2007 Oct 15;23(20):2733-40. doi: 10.1093/bioinformatics/btm441. Epub 2007 Sep 7.

PCCA: a program for phylogenetic canonical correlation analysis.

Bioinformatics. 2008 Apr 1;24(7):1018-20. doi: 10.1093/bioinformatics/btn065. Epub 2008 Feb 21.

Mixtures of regression models for time course gene expression data: evaluation of initialization and random effects.

Bioinformatics. 2010 Feb 1;26(3):370-7. doi: 10.1093/bioinformatics/btp686. Epub 2009 Dec 29.

Branch and bound computation of exact p-values.

Bioinformatics. 2006 Sep 1;22(17):2158-9. doi: 10.1093/bioinformatics/btl357. Epub 2006 Aug 7.

Graph selection with GGMselect.

Stat Appl Genet Mol Biol. 2012 Feb 10;11(3):Article 3. doi: 10.1515/1544-6115.1625.

Regularized sandwich estimators for analysis of high-dimensional data using generalized estimating equations.

Biometrics. 2011 Mar;67(1):116-23. doi: 10.1111/j.1541-0420.2010.01438.x.

引用本文的文献

Cluster analysis of replicated alternative polyadenylation data using canonical correlation analysis.

BMC Genomics. 2019 Jan 22;20(1):75. doi: 10.1186/s12864-019-5433-7.

Time-resolved proteome profiling of normal lung development.

Am J Physiol Lung Cell Mol Physiol. 2018 Jul 1;315(1):L11-L24. doi: 10.1152/ajplung.00316.2017. Epub 2018 Mar 8.

Uncovering robust patterns of microRNA co-expression across cancers using Bayesian Relevance Networks.

PLoS One. 2017 Aug 17;12(8):e0183103. doi: 10.1371/journal.pone.0183103. eCollection 2017.

CorSig: a general framework for estimating statistical significance of correlation and its application to gene co-expression analysis.

PLoS One. 2013 Oct 23;8(10):e77429. doi: 10.1371/journal.pone.0077429. eCollection 2013.

Assessing numerical dependence in gene expression summaries with the jackknife expression difference.

PLoS One. 2012;7(8):e39570. doi: 10.1371/journal.pone.0039570. Epub 2012 Aug 2.

Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates.

BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S15. doi: 10.1186/1752-0509-5-S2-S15. Epub 2011 Dec 14.

Cross-platform analysis of global microRNA expression technologies.

BMC Genomics. 2010 May 26;11:330. doi: 10.1186/1471-2164-11-330.

Effects of scanning sensitivity and multiple scan algorithms on microarray data quality.

BMC Bioinformatics. 2010 Mar 12;11:127. doi: 10.1186/1471-2105-11-127.

A statistical framework for consolidating "sibling" probe sets for Affymetrix GeneChip data.

BMC Genomics. 2008 Apr 24;9:188. doi: 10.1186/1471-2164-9-188.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于从复制的全基因组数据推断功能关系的多变量相关估计器。

Multivariate correlation estimator for inferring functional relationships from replicated genome-wide data.

作者信息

机构信息

出版信息

UNLABELLED

AVAILABILITY

SUPPLEMENTARY INFORMATION

未标注

可用性

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献