Suppr超能文献

对不规则采样时间序列进行格兰杰因果检验及其在拟南芥氮信号传导中的应用。

Granger-causal testing for irregularly sampled time series with application to nitrogen signalling in Arabidopsis.

作者信息

Heerah Sachin, Molinari Roberto, Guerrier Stéphane, Marshall-Colon Amy

机构信息

Department of Plant Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.

Department of Mathematics and Statistics, Auburn University, Auburn, AL 36849, USA.

出版信息

Bioinformatics. 2021 Aug 25;37(16):2450-2460. doi: 10.1093/bioinformatics/btab126.

Abstract

MOTIVATION

Identification of system-wide causal relationships can contribute to our understanding of long-distance, intercellular signalling in biological organisms. Dynamic transcriptome analysis holds great potential to uncover coordinated biological processes between organs. However, many existing dynamic transcriptome studies are characterized by sparse and often unevenly spaced time points that make the identification of causal relationships across organs analytically challenging. Application of existing statistical models, designed for regular time series with abundant time points, to sparse data may fail to reveal biologically significant, causal relationships. With increasing research interest in biological time series data, there is a need for new statistical methods that are able to determine causality within and between time series data sets. Here, a statistical framework was developed to identify (Granger) causal gene-gene relationships of unevenly spaced, multivariate time series data from two different tissues of Arabidopsis thaliana in response to a nitrogen signal.

RESULTS

This work delivers a statistical approach for modelling irregularly sampled bivariate signals which embeds functions from the domain of engineering that allow to adapt the model's dependence structure to the specific sampling time. Using maximum-likelihood to estimate the parameters of this model for each bivariate time series, it is then possible to use bootstrap procedures for small samples (or asymptotics for large samples) in order to test for Granger-Causality. When applied to the A.thaliana data, the proposed approach produced 3078 significant interactions, in which 2012 interactions have root causal genes and 1066 interactions have shoot causal genes. Many of the predicted causal and target genes are known players in local and long-distance nitrogen signalling, including genes encoding transcription factors, hormones and signalling peptides. Of the 1007 total causal genes (either organ), 384 are either known or predicted mobile transcripts, suggesting that the identified causal genes may be directly involved in long-distance nitrogen signalling through intercellular interactions. The model predictions and subsequent network analysis identified nitrogen-responsive genes that can be further tested for their specific roles in long-distance nitrogen signalling.

AVAILABILITY AND IMPLEMENTATION

The method was developed with the R statistical software and is made available through the R package 'irg' hosted on the GitHub repository https://github.com/SMAC-Group/irg where also a running example vignette can be found (https://smac-group.github.io/irg/articles/vignette.html). A few signals from the original data set are made available in the package as an example to apply the method and the complete A.thaliana data can be found at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE97500.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

识别全系统的因果关系有助于我们理解生物体内的长距离细胞间信号传导。动态转录组分析在揭示器官间协调的生物学过程方面具有巨大潜力。然而,许多现有的动态转录组研究的特点是时间点稀疏且间隔往往不均匀,这使得跨器官识别因果关系在分析上具有挑战性。将为具有丰富时间点的规则时间序列设计的现有统计模型应用于稀疏数据,可能无法揭示生物学上显著的因果关系。随着对生物时间序列数据的研究兴趣日益增加,需要新的统计方法来确定时间序列数据集内部和之间的因果关系。在此,开发了一个统计框架,以识别拟南芥两种不同组织响应氮信号的不均匀间隔多变量时间序列数据中的(格兰杰)因果基因-基因关系。

结果

这项工作提供了一种对不规则采样的双变量信号进行建模的统计方法,该方法嵌入了工程领域的函数,允许使模型的依赖结构适应特定的采样时间。通过最大似然估计每个双变量时间序列的该模型参数,然后可以使用小样本的自助程序(或大样本的渐近方法)来检验格兰杰因果关系。当应用于拟南芥数据时,所提出的方法产生了3078个显著的相互作用,其中2012个相互作用有根因果基因,1066个相互作用有地上部因果基因。许多预测的因果基因和靶基因是局部和长距离氮信号传导中的已知参与者,包括编码转录因子、激素和信号肽的基因。在总共1007个因果基因(任一器官)中,384个是已知或预测的移动转录本,这表明所识别的因果基因可能通过细胞间相互作用直接参与长距离氮信号传导。模型预测和随后的网络分析确定了可进一步测试其在长距离氮信号传导中特定作用的氮响应基因。

可用性和实现

该方法是用R统计软件开发的,可通过托管在GitHub仓库https://github.com/SMAC-Group/irg上 的R包“irg”获得,在该仓库中还可以找到一个运行示例小插图(https://smac-group.github.io/irg/articles/vignette.html)。该包中提供了原始数据集中的一些信号作为应用该方法的示例,完整的拟南芥数据可在以下网址找到:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE97500。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5401/8388030/2e3ab97502b5/btab126f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验