Walter Wencke, Striberny Bernd, Gaquerel Emmanuel, Baldwin Ian T, Kim Sang-Gyu, Heiland Ines
Department of Molecular Ecology, Max Planck Institute for Chemical Ecology, Hans-Knöll-Straße 8, D-07745, Jena, Germany.
Department of Arctic and Marine Biology, UiT The Arctic University of Norway, Naturfagbygget, Dramsvegen 201, 9037, Tromsø, Norway.
BMC Bioinformatics. 2014 Oct 25;15(1):352. doi: 10.1186/s12859-014-0352-8.
As time series experiments in higher eukaryotes usually obtain data from different individuals collected at the different time points, a time series sample itself is not equivalent to a true biological replicate but is, rather, a combination of several biological replicates. The analysis of expression data derived from a time series sample is therefore often performed with a low number of replicates due to budget limitations or limitations in sample availability. In addition, most algorithms developed to identify specific patterns in time series dataset do not consider biological variation in samples collected at the same conditions.
Using artificial time course datasets, we show that resampling considerably improves the accuracy of transcripts identified as rhythmic. In particular, the number of false positives can be greatly reduced while at the same time the number of true positives can be maintained in the range of other methods currently used to determine rhythmically expressed genes.
The resampling approach described here therefore increases the accuracy of time series expression data analysis and furthermore emphasizes the importance of biological replicates in identifying oscillating genes. Resampling can be used for any time series expression dataset as long as the samples are acquired from independent individuals at each time point.
由于高等真核生物中的时间序列实验通常从在不同时间点收集的不同个体获取数据,时间序列样本本身并不等同于真正的生物学重复,而是几个生物学重复的组合。因此,由于预算限制或样本可用性限制,从时间序列样本获得的表达数据的分析通常在重复次数较少的情况下进行。此外,大多数为识别时间序列数据集中的特定模式而开发的算法并未考虑在相同条件下收集的样本中的生物学变异。
使用人工时间进程数据集,我们表明重采样可显著提高被鉴定为有节律的转录本的准确性。特别是,可以大大减少假阳性的数量,同时真阳性的数量可以保持在目前用于确定节律性表达基因的其他方法的范围内。
因此,这里描述的重采样方法提高了时间序列表达数据分析的准确性,并且进一步强调了生物学重复在鉴定振荡基因中的重要性。只要样本是在每个时间点从独立个体获取的,重采样就可用于任何时间序列表达数据集。