School of Mathematics and Statistics, University of Sydney, NSW 2006, Australia.
Department of Statistics, University of California, Berkeley, CA 94720, USA.
Bioinformatics. 2018 Feb 15;34(4):617-624. doi: 10.1093/bioinformatics/btx641.
Capturing association patterns in gene expression levels under different conditions or time points is important for inferring gene regulatory interactions. In practice, temporal changes in gene expression may result in complex association patterns that require more sophisticated detection methods than simple correlation measures. For instance, the effect of regulation may lead to time-lagged associations and interactions local to a subset of samples. Furthermore, expression profiles of interest may not be aligned or directly comparable (e.g. gene expression profiles from two species).
We propose a count statistic for measuring association between pairs of gene expression profiles consisting of ordered samples (e.g. time-course), where correlation may only exist locally in subsequences separated by a position shift. The statistic is simple and fast to compute, and we illustrate its use in two applications. In a cross-species comparison of developmental gene expression levels, we show our method not only measures association of gene expressions between the two species, but also provides alignment between different developmental stages. In the second application, we applied our statistic to expression profiles from two distinct phenotypic conditions, where the samples in each profile are ordered by the associated phenotypic values. The detected associations can be useful in building correspondence between gene association networks under different phenotypes. On the theoretical side, we provide asymptotic distributions of the statistic for different regions of the parameter space and test its power on simulated data.
The code used to perform the analysis is available as part of the Supplementary Material.
msw@usc.edu or hhuang@stat.berkeley.edu.
Supplementary data are available at Bioinformatics online.
捕捉不同条件或时间点下基因表达水平的关联模式对于推断基因调控相互作用非常重要。实际上,基因表达的时间变化可能导致复杂的关联模式,需要比简单的相关度量更复杂的检测方法。例如,调节的影响可能导致时间滞后的关联和局部到样本子集的相互作用。此外,感兴趣的表达谱可能没有对齐或直接可比(例如,来自两个物种的基因表达谱)。
我们提出了一种用于测量由有序样本(例如时间序列)组成的基因表达谱对之间关联的计数统计量,其中相关性仅可能在由位置偏移分隔的子序列中局部存在。该统计量计算简单且快速,我们在两个应用中说明了其使用。在跨物种发育基因表达水平的比较中,我们表明我们的方法不仅测量了两个物种之间基因表达的关联,而且还提供了不同发育阶段之间的对齐。在第二个应用中,我们将我们的统计量应用于两个不同表型条件的表达谱,其中每个谱中的样本按相关表型值进行排序。检测到的关联可用于在不同表型下构建基因关联网络之间的对应关系。在理论方面,我们为不同参数空间区域的统计量提供了渐近分布,并在模拟数据上测试了其功效。
执行分析所用的代码可作为补充材料的一部分获得。
msw@usc.edu 或 hhuang@stat.berkeley.edu。
补充数据可在生物信息学在线获得。