Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
Institute for Data Exploration and Applications, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
Bioinformatics. 2021 May 5;37(6):767-774. doi: 10.1093/bioinformatics/btaa877.
Circadian rhythms are approximately 24-h endogenous cycles that control many biological functions. To identify these rhythms, biological samples are taken over circadian time and analyzed using a single omics type, such as transcriptomics or proteomics. By comparing data from these single omics approaches, it has been shown that transcriptional rhythms are not necessarily conserved at the protein level, implying extensive circadian post-transcriptional regulation. However, as proteomics methods are known to be noisier than transcriptomic methods, this suggests that previously identified arrhythmic proteins with rhythmic transcripts could have been missed due to noise and may not be due to post-transcriptional regulation.
To determine if one can use information from less-noisy transcriptomic data to inform rhythms in more-noisy proteomic data, and thus more accurately identify rhythms in the proteome, we have created the Multi-Omics Selection with Amplitude Independent Criteria (MOSAIC) application. MOSAIC combines model selection and joint modeling of multiple omics types to recover significant circadian and non-circadian trends. Using both synthetic data and proteomic data from Neurospora crassa, we showed that MOSAIC accurately recovers circadian rhythms at higher rates in not only the proteome but the transcriptome as well, outperforming existing methods for rhythm identification. In addition, by quantifying non-circadian trends in addition to circadian trends in data, our methodology allowed for the recognition of the diversity of circadian regulation as compared to non-circadian regulation.
MOSAIC's full interface is available at https://github.com/delosh653/MOSAIC. An R package for this functionality, mosaic.find, can be downloaded at https://CRAN.R-project.org/package=mosaic.find.
Supplementary data are available at Bioinformatics online.
昼夜节律是大约 24 小时的内源性周期,控制着许多生物功能。为了识别这些节律,生物样本在昼夜时间内被采集,并使用单一的组学类型进行分析,如转录组学或蛋白质组学。通过比较这些单一组学方法的数据,已经表明转录节律在蛋白质水平上不一定保守,这意味着广泛的昼夜转录后调控。然而,由于蛋白质组学方法的噪声比转录组学方法大,这表明以前鉴定的具有节律转录本的非节律蛋白可能由于噪声而被遗漏,而不是由于转录后调控。
为了确定是否可以利用来自噪声较小的转录组学数据的信息来推断噪声较大的蛋白质组学数据中的节律,从而更准确地识别蛋白质组中的节律,我们创建了多组学选择与幅度独立标准(MOSAIC)应用程序。MOSAIC 结合了多个组学类型的模型选择和联合建模,以恢复显著的昼夜和非昼夜趋势。使用合成数据和Neurospora crassa 的蛋白质组学数据,我们表明 MOSAIC 不仅可以更准确地恢复蛋白质组中的昼夜节律,而且可以更准确地恢复转录组中的昼夜节律,优于现有的节律识别方法。此外,通过在数据中量化非昼夜趋势以及昼夜趋势,我们的方法学允许识别与非昼夜调节相比的昼夜调节的多样性。
MOSAIC 的完整界面可在 https://github.com/delosh653/MOSAIC 上获得。此功能的 R 包 mosaic.find 可在 https://CRAN.R-project.org/package=mosaic.find 下载。
补充数据可在生物信息学在线获得。