Liew Alan Wee-Chung, Xian Jun, Wu Shuanhu, Smith David, Yan Hong
School of Information & Communication Technology, Griffith University, Brisbane, Australia.
BMC Bioinformatics. 2007 Apr 24;8:137. doi: 10.1186/1471-2105-8-137.
Periodogram analysis of time-series is widespread in biology. A new challenge for analyzing the microarray time series data is to identify genes that are periodically expressed. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, and unevenly sampled time points. Most methods used in the literature operate on evenly sampled time series and are not suitable for unevenly sampled time series.
For evenly sampled data, methods based on the classical Fourier periodogram are often used to detect periodically expressed gene. Recently, the Lomb-Scargle algorithm has been applied to unevenly sampled gene expression data for spectral estimation. However, since the Lomb-Scargle method assumes that there is a single stationary sinusoid wave with infinite support, it introduces spurious periodic components in the periodogram for data with a finite length. In this paper, we propose a new spectral estimation algorithm for unevenly sampled gene expression data. The new method is based on signal reconstruction in a shift-invariant signal space, where a direct spectral estimation procedure is developed using the B-spline basis. Experiments on simulated noisy gene expression profiles show that our algorithm is superior to the Lomb-Scargle algorithm and the classical Fourier periodogram based method in detecting periodically expressed genes. We have applied our algorithm to the Plasmodium falciparum and Yeast gene expression data and the results show that the algorithm is able to detect biologically meaningful periodically expressed genes.
We have proposed an effective method for identifying periodic genes in unevenly sampled space of microarray time series gene expression data. The method can also be used as an effective tool for gene expression time series interpolation or resampling.
时间序列的周期图分析在生物学中广泛应用。分析微阵列时间序列数据面临的一个新挑战是识别周期性表达的基因。出现这种挑战是因为观测到的时间序列通常存在非理想情况,如噪声、长度较短以及采样时间点不均匀。文献中使用的大多数方法适用于均匀采样的时间序列,不适用于不均匀采样的时间序列。
对于均匀采样的数据,基于经典傅里叶周期图的方法常被用于检测周期性表达的基因。最近, Lomb - Scargle算法已应用于不均匀采样的基因表达数据进行频谱估计。然而,由于Lomb - Scargle方法假设存在一个具有无限支撑的单一平稳正弦波,对于有限长度的数据,它会在周期图中引入虚假的周期成分。在本文中,我们提出了一种针对不均匀采样的基因表达数据的新频谱估计算法。新方法基于在平移不变信号空间中的信号重构,其中使用B样条基开发了一种直接频谱估计程序。对模拟的有噪声基因表达谱进行的实验表明,在检测周期性表达的基因方面,我们的算法优于Lomb - Scargle算法和基于经典傅里叶周期图的方法。我们已将我们 的算法应用于恶性疟原虫和酵母基因表达数据,结果表明该算法能够检测到具有生物学意义的周期性表达基因。
我们提出了一种在微阵列时间序列基因表达数据的不均匀采样空间中识别周期基因的有效方法。该方法也可作为基因表达时间序列插值或重采样的有效工具。