Suppr超能文献

纵向基因表达数据的统计学显著性分析。

Statistical significance analysis of longitudinal gene expression data.

作者信息

Guo Xu, Qi Huilin, Verfaillie Catherine M, Pan Wei

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, MMC 303, Minneapolis, MN 55455-0378, USA.

出版信息

Bioinformatics. 2003 Sep 1;19(13):1628-35. doi: 10.1093/bioinformatics/btg206.

Abstract

MOTIVATION

Time-course microarray experiments are designed to study biological processes in a temporal fashion. Longitudinal gene expression data arise when biological samples taken from the same subject at different time points are used to measure the gene expression levels. It has been observed that the gene expression patterns of samples of a given tumor measured at different time points are likely to be much more similar to each other than are the expression patterns of tumor samples of the same type taken from different subjects. In statistics, this phenomenon is called the within-subject correlation of repeated measurements on the same subject, and the resulting data are called longitudinal data. It is well known in other applications that valid statistical analyses have to appropriately take account of the possible within-subject correlation in longitudinal data.

RESULTS

We apply estimating equation techniques to construct a robust statistic, which is a variant of the robust Wald statistic and accounts for the potential within-subject correlation of longitudinal gene expression data, to detect genes with temporal changes in expression. We associate significance levels to the proposed statistic by either incorporating the idea of the significance analysis of microarrays method or using the mixture model method to identify significant genes. The utility of the statistic is demonstrated by applying it to an important study of osteoblast lineage-specific differentiation. Using simulated data, we also show pitfalls in drawing statistical inference when the within-subject correlation in longitudinal gene expression data is ignored.

摘要

动机

时间进程微阵列实验旨在以时间方式研究生物过程。当在不同时间点从同一受试者采集的生物样本用于测量基因表达水平时,就会产生纵向基因表达数据。据观察,在不同时间点测量的给定肿瘤样本的基因表达模式彼此之间可能比从不同受试者采集的相同类型肿瘤样本的表达模式更为相似。在统计学中,这种现象被称为同一受试者重复测量的受试者内相关性,由此产生的数据被称为纵向数据。在其他应用中众所周知,有效的统计分析必须适当考虑纵向数据中可能存在的受试者内相关性。

结果

我们应用估计方程技术构建一个稳健统计量,它是稳健Wald统计量的一种变体,考虑了纵向基因表达数据潜在的受试者内相关性,以检测表达随时间变化的基因。我们通过纳入微阵列方法的显著性分析思想或使用混合模型方法来识别显著基因,将显著性水平与所提出的统计量相关联。通过将该统计量应用于成骨细胞谱系特异性分化的一项重要研究,证明了其效用。使用模拟数据,我们还展示了在忽略纵向基因表达数据中的受试者内相关性时进行统计推断的陷阱。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验