Department of Ecology and Evolution, Batiment Biophore, Quartier UNIL-Sorge, Université de Lausanne, Lausanne, Switzerland.
Swiss Institute of Bioinformatics, Batiment Génopode, Quartier UNIL-Sorge, Université de Lausanne, Lausanne, Switzerland.
PLoS Comput Biol. 2020 Mar 17;16(3):e1007666. doi: 10.1371/journal.pcbi.1007666. eCollection 2020 Mar.
The nycthemeral transcriptome embodies all genes displaying a rhythmic variation of their mRNAs periodically every 24 hours, including but not restricted to circadian genes. In this study, we show that the nycthemeral rhythmicity at the gene expression level is biologically functional and that this functionality is more conserved between orthologous genes than between random genes. We used this conservation of the rhythmic expression to assess the ability of seven methods (ARSER, Lomb Scargle, RAIN, JTK, empirical-JTK, GeneCycle, and meta2d) to detect rhythmic signal in gene expression. We have contrasted them to a naive method, not based on rhythmic parameters. By taking into account the tissue-specificity of rhythmic gene expression and different species comparisons, we show that no method is strongly favored. The results show that these methods designed for rhythm detection, in addition to having quite similar performances, are consistent only among genes with a strong rhythm signal. Rhythmic genes defined with a standard p-value threshold of 0.01 for instance, could include genes whose rhythmicity is biologically irrelevant. Although these results were dependent on the datasets used and the evolutionary distance between the species compared, we call for caution about the results of studies reporting or using large sets of rhythmic genes. Furthermore, given the analysis of the behaviors of the methods on real and randomized data, we recommend using primarily ARS, empJTK, or GeneCycle, which verify expectations of a classical distribution of p-values. Experimental design should also take into account the circumstances under which the methods seem more efficient, such as giving priority to biological replicates over the number of time-points, or to the number of time-points over the quality of the technique (microarray vs RNAseq). GeneCycle, and to a lesser extent empirical-JTK, might be the most robust method when applied to weakly informative datasets. Finally, our analyzes suggest that rhythmic genes are mainly highly expressed genes.
昼夜转录组体现了所有基因的 mRNA 周期性地每隔 24 小时变化一次的节律性,包括但不限于昼夜节律基因。在这项研究中,我们表明基因表达水平的昼夜节律性是具有生物学功能的,并且这种功能在同源基因之间比在随机基因之间更为保守。我们利用这种节律表达的保守性,评估了七种方法(ARSER、 Lomb Scargle、RAIN、JTK、empirical-JTK、GeneCycle 和 meta2d)在基因表达中检测节律信号的能力。我们将它们与一种不基于节律参数的简单方法进行了对比。通过考虑到节律基因表达的组织特异性和不同物种的比较,我们发现没有一种方法具有明显的优势。结果表明,这些专门设计用于检测节律的方法,除了具有相似的性能外,仅在具有强节律信号的基因中才具有一致性。例如,使用标准 p 值阈值 0.01 定义的节律基因可能包括其节律性在生物学上无关紧要的基因。尽管这些结果依赖于所使用的数据集和比较的物种之间的进化距离,但我们呼吁对报告或使用大量节律基因的研究结果保持谨慎。此外,鉴于对真实和随机数据的方法行为进行分析,我们建议主要使用 ARS、empJTK 或 GeneCycle,这些方法验证了经典 p 值分布的期望。实验设计还应考虑到方法在哪些情况下更为有效,例如优先考虑生物学重复而不是时间点的数量,或者优先考虑时间点的数量而不是技术的质量(微阵列与 RNAseq)。当应用于信息量较少的数据集时,GeneCycle 以及在较小程度上是 empirical-JTK,可能是最稳健的方法。最后,我们的分析表明节律基因主要是高表达基因。