Andersson Claes R, Isaksson Anders, Gustafsson Mats G
The Linnaeus Centre for Bioinformatics, BMC, Uppsala University, Box 598, S-751 24 Uppsala, Sweden.
BMC Bioinformatics. 2006 Feb 9;7:63. doi: 10.1186/1471-2105-7-63.
Detection of periodically expressed genes from microarray data without use of known periodic and non-periodic training examples is an important problem, e.g. for identifying genes regulated by the cell-cycle in poorly characterised organisms. Commonly the investigator is only interested in genes expressed at a particular frequency that characterizes the process under study but this frequency is seldom exactly known. Previously proposed detector designs require access to labelled training examples and do not allow systematic incorporation of diffuse prior knowledge available about the period time.
A learning-free Bayesian detector that does not rely on labelled training examples and allows incorporation of prior knowledge about the period time is introduced. It is shown to outperform two recently proposed alternative learning-free detectors on simulated data generated with models that are different from the one used for detector design. Results from applying the detector to mRNA expression time profiles from S. cerevisiae showsthat the genes detected as periodically expressed only contain a small fraction of the cell-cycle genes inferred from mutant phenotype. For example, when the probability of false alarm was equal to 7%, only 12% of the cell-cycle genes were detected. The genes detected as periodically expressed were found to have a statistically significant overrepresentation of known cell-cycle regulated sequence motifs. One known sequence motif and 18 putative motifs, previously not associated with periodic expression, were also over represented.
In comparison with recently proposed alternative learning-free detectors for periodic gene expression, Bayesian inference allows systematic incorporation of diffuse a priori knowledge about, e.g. the period time. This results in relative performance improvements due to increased robustness against errors in the underlying assumptions. Results from applying the detector to mRNA expression time profiles from S. cerevisiae include several new findings that deserve further experimental studies.
在不使用已知的周期性和非周期性训练示例的情况下,从微阵列数据中检测周期性表达的基因是一个重要问题,例如用于识别特征不明确的生物体中受细胞周期调控的基因。通常,研究人员只对以表征所研究过程的特定频率表达的基因感兴趣,但这个频率很少是确切已知的。先前提出的检测器设计需要访问标记的训练示例,并且不允许系统地纳入关于周期时间的弥散先验知识。
引入了一种无需学习的贝叶斯检测器,它不依赖于标记的训练示例,并允许纳入关于周期时间的先验知识。在使用与用于检测器设计的模型不同的模型生成的模拟数据上,它被证明优于最近提出的另外两种无需学习的检测器。将该检测器应用于酿酒酵母mRNA表达时间谱的结果表明,被检测为周期性表达的基因仅包含一小部分从突变体表型推断出的细胞周期基因。例如,当误报概率等于7%时,仅检测到12%的细胞周期基因。被检测为周期性表达的基因在统计上显著过度富集已知的细胞周期调控序列基序。一个已知的序列基序和18个先前与周期性表达无关的假定基序也过度富集。
与最近提出的用于周期性基因表达的其他无需学习的检测器相比,贝叶斯推理允许系统地纳入关于例如周期时间的弥散先验知识。由于对基础假设中的错误具有更高的鲁棒性,这导致了相对性能的提高。将该检测器应用于酿酒酵母mRNA表达时间谱的结果包括几个值得进一步实验研究的新发现。