Hutchison Alan L, Maienschein-Cline Mark, Chiang Andrew H, Tabei S M Ali, Gudjonson Herman, Bahroos Neil, Allada Ravi, Dinner Aaron R
Medical Scientist Training Program, University of Chicago, Chicago, Illinois, United States of America; Graduate Program in the Biophysical Sciences, University of Chicago, Chicago, Illinois, United States of America; James Franck Institute, University of Chicago, Chicago, Illinois, United States of America.
Center for Research Informatics, University of Illinois at Chicago, Chicago, Illinois, United States of America.
PLoS Comput Biol. 2015 Mar 20;11(3):e1004094. doi: 10.1371/journal.pcbi.1004094. eCollection 2015 Mar.
Robust methods for identifying patterns of expression in genome-wide data are important for generating hypotheses regarding gene function. To this end, several analytic methods have been developed for detecting periodic patterns. We improve one such method, JTK_CYCLE, by explicitly calculating the null distribution such that it accounts for multiple hypothesis testing and by including non-sinusoidal reference waveforms. We term this method empirical JTK_CYCLE with asymmetry search, and we compare its performance to JTK_CYCLE with Bonferroni and Benjamini-Hochberg multiple hypothesis testing correction, as well as to five other methods: cyclohedron test, address reduction, stable persistence, ANOVA, and F24. We find that ANOVA, F24, and JTK_CYCLE consistently outperform the other three methods when data are limited and noisy; empirical JTK_CYCLE with asymmetry search gives the greatest sensitivity while controlling for the false discovery rate. Our analysis also provides insight into experimental design and we find that, for a fixed number of samples, better sensitivity and specificity are achieved with higher numbers of replicates than with higher sampling density. Application of the methods to detecting circadian rhythms in a metadataset of microarrays that quantify time-dependent gene expression in whole heads of Drosophila melanogaster reveals annotations that are enriched among genes with highly asymmetric waveforms. These include a wide range of oxidation reduction and metabolic genes, as well as genes with transcripts that have multiple splice forms.
识别全基因组数据中表达模式的稳健方法对于生成有关基因功能的假设非常重要。为此,已经开发了几种用于检测周期性模式的分析方法。我们改进了一种这样的方法JTK_CYCLE,通过明确计算零分布以考虑多重假设检验,并纳入非正弦参考波形。我们将此方法称为具有不对称搜索的经验JTK_CYCLE,并将其性能与采用Bonferroni和Benjamini-Hochberg多重假设检验校正的JTK_CYCLE,以及其他五种方法进行比较:环面体检验、地址约简、稳定持久性、方差分析和F24。我们发现,当数据有限且有噪声时,方差分析、F24和JTK_CYCLE始终优于其他三种方法;具有不对称搜索的经验JTK_CYCLE在控制错误发现率的同时具有最高的灵敏度。我们的分析还为实验设计提供了见解,并且我们发现,对于固定数量的样本,与更高的采样密度相比,更多的重复次数能实现更好的灵敏度和特异性。将这些方法应用于检测果蝇全头中量化时间依赖性基因表达的微阵列元数据集中的昼夜节律,揭示了在具有高度不对称波形的基因中富集的注释。这些注释包括广泛的氧化还原和代谢基因,以及具有多种剪接形式转录本的基因。