Suppr超能文献

基于方差分量检验的基因集分析。

Gene set analysis using variance component tests.

机构信息

Department of Epidemiology, Brown University, 121 South Main Street, Providence, RI 02912, USA.

出版信息

BMC Bioinformatics. 2013 Jun 28;14:210. doi: 10.1186/1471-2105-14-210.

Abstract

BACKGROUND

Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses.

RESULTS

We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA).

CONCLUSION

We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.

摘要

背景

基因集分析在基因组研究中变得越来越重要,因为许多复杂疾病是由众多基因的改变共同贡献的。基因通常作为一个功能库协同作用,例如,生物途径/网络,并且高度相关。然而,大多数现有的基因集分析方法并没有充分考虑基因之间的相关性。在这里,我们提出解决基因集的这一重要特征,以提高基因集分析的统计能力。

结果

我们提出使用多元线性回归模型来模拟独立变量(例如,暴露/生物状态(是/否))对基因集中多个基因表达值的影响,其中使用工作协方差矩阵显式地对基因之间的相关性进行建模。我们开发了 TEGS(基因集效应检验),这是一种方差分量检验方法,用于检验多元线性回归模型中回归系数的共同分布假设下的基因集效应,并使用置换和缩放卡方逼近计算 p 值。我们通过模拟表明,在不同的工作协方差矩阵选择下,I 型错误得到保护,并且随着工作协方差接近真实协方差,功效得到提高。全局检验是忽略基因集中基因之间相关性时 TEGS 的特殊情况。使用模拟数据和已发表的糖尿病数据集,我们表明我们的检验方法优于常用的方法,即全局检验和基因集富集分析(GSEA)。

结论

我们在多元回归框架下开发了一种基因集分析方法(TEGS),该方法使用工作协方差直接对基因集中的表达值的相关性进行建模。TEGS 在模拟和糖尿病微阵列数据中均优于两种广泛使用的方法,即 GSEA 和全局检验。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6068/3776447/4222a39d05a3/1471-2105-14-210-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验