Liu Delong, Peddada Shyamal D, Li Leping, Weinberg Clarice R
Biostatistics Branch, National Institute of Environmental Health Sciences, MD: A3-03, 111 TW Alexander Dr, Research Triangle Park, NC 27709, USA.
BMC Bioinformatics. 2006 Feb 23;7:87. doi: 10.1186/1471-2105-7-87.
Recent circadian clock studies using gene expression microarray in two different tissues of mouse have revealed not all circadian-related genes are synchronized in phase or peak expression times across tissues in vivo. Instead, some circadian-related genes may be delayed by 4-8 hrs in peak expression in one tissue relative to the other. These interesting biological observations prompt a statistical question regarding how to distinguish the synchronized genes from genes that are systematically lagged in phase/peak expression time across two tissues.
We propose a set of techniques from circular statistics to analyze phase angles of circadian-related genes in two tissues. We first estimate the phases of a cycling gene separately in each tissue, which are then used to estimate the paired angular difference of the phase angles of the gene in the two tissues. These differences are modeled as a mixture of two von Mises distributions which enables us to cluster genes into two groups; one group having synchronized transcripts with the same phase in the two tissues, the other containing transcripts with a discrepancy in phase between the two tissues. For each cluster of genes we assess the association of phases across the tissue types using circular-circular regression. We also develop a bootstrap methodology based on a circular-circular regression model to evaluate the improvement in fit provided by allowing two components versus a one-component von-Mises model.
We applied our proposed methodologies to the circadian-related genes common to heart and liver tissues in Storch et al. 2, and found that an estimated 80% of circadian-related transcripts common to heart and liver tissues were synchronized in phase, and the other 20% of transcripts were lagged about 8 hours in liver relative to heart. The bootstrap p-value for being one cluster is 0.063, which suggests the possibility of two clusters. Our methodologies can be extended to analyze peak expression times of circadian-related genes across more than two tissues, for example, kidney, heart, liver, and the suprachiasmatic nuclei (SCN) of the hypothalamus.
最近利用基因表达微阵列对小鼠的两种不同组织进行的昼夜节律时钟研究表明,在体内并非所有与昼夜节律相关的基因在各组织中的相位或峰值表达时间都是同步的。相反,一些与昼夜节律相关的基因在一个组织中的峰值表达可能相对于另一个组织延迟4 - 8小时。这些有趣的生物学观察引发了一个统计学问题,即如何区分在两个组织中相位/峰值表达时间同步的基因与系统性滞后的基因。
我们提出了一组来自圆统计学的技术来分析两个组织中与昼夜节律相关基因的相位角。我们首先在每个组织中分别估计一个循环基因的相位,然后用这些相位来估计该基因在两个组织中的相位角的配对角差。这些差异被建模为两个冯·米塞斯分布的混合,这使我们能够将基因聚类为两组;一组在两个组织中具有相同相位的同步转录本,另一组包含在两个组织之间相位存在差异的转录本。对于每一组基因,我们使用圆 - 圆回归评估跨组织类型的相位关联。我们还基于圆 - 圆回归模型开发了一种自助法,以评估允许两个成分与单成分冯·米塞斯模型相比在拟合度上的改进。
我们将所提出的方法应用于斯托奇等人研究中的心脏和肝脏组织共有的与昼夜节律相关的基因,发现估计心脏和肝脏组织共有的与昼夜节律相关的转录本中80%在相位上是同步的,另外20%的转录本在肝脏中相对于心脏滞后约8小时。作为一个聚类的自助p值为0.063,这表明存在两个聚类的可能性。我们的方法可以扩展到分析跨两个以上组织(例如肾脏、心脏、肝脏和下丘脑的视交叉上核(SCN))的与昼夜节律相关基因的峰值表达时间。