一种用于识别由环境信号诱导的基因表达差异模式的斯凯拉姆模型。

A skellam model to identify differential patterns of gene expression induced by environmental signals.

作者信息

Jiang Libo, Mao Ke, Wu Rongling

机构信息

Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China.

出版信息

BMC Genomics. 2014 Sep 8;15(1):772. doi: 10.1186/1471-2164-15-772.

DOI:10.1186/1471-2164-15-772

PMID:25199446

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4167515/

Abstract

BACKGROUND

RNA-seq, based on deep-sequencing techniques, has been widely employed to precisely measure levels of transcripts and their isoforms expressed under different conditions. However, robust statistical tools used to analyze these complex datasets are lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks that have become increasingly important.

RESULTS

We proposed and verified a cluster algorithm based on a skellam model for grouping genes into distinct groups based on the pattern of gene expression in response to changing conditions or in different tissues. This algorithm capitalizes on the skellam distribution to capture the count property of RNA-seq data and clusters genes in different environments. A two-stage hierarchical expectation-maximization (EM) algorithm was implemented to estimate the optimal number of groups and mean expression levels of each group across two environments. A procedure was formulated to test whether and how a given group shows a plastic response to environmental changes. The model was used to analyze an RNA-seq dataset measured from reciprocal crosses of early Arabidopsis thaliana embryos that respond differently based on the extent of maternal and paternal genome contributions, from which genes associated with maternal and paternal contributions were identified. Simulation studies were also performed to validate the statistical behavior of the model.

CONCLUSIONS

This model is a useful tool for clustering gene expression data by RNA-seq, thus facilitating our understanding of gene functions and networks.

摘要

背景

基于深度测序技术的RNA测序（RNA-seq）已被广泛用于精确测量在不同条件下表达的转录本及其异构体的水平。然而，用于分析这些复杂数据集的强大统计工具却很缺乏。通过对不同处理中具有相似表达谱的基因进行分组，聚类分析为了解基因功能和网络提供了见解，而这些功能和网络正变得越来越重要。

结果

我们提出并验证了一种基于斯凯拉姆模型的聚类算法，该算法可根据基因在应对变化条件或不同组织时的表达模式将基因分组为不同的类别。该算法利用斯凯拉姆分布来捕捉RNA-seq数据的计数特性，并在不同环境中对基因进行聚类。实施了两阶段分层期望最大化（EM）算法，以估计两个环境中最优的组数和每组的平均表达水平。制定了一个程序来测试给定的组是否以及如何对环境变化表现出可塑性反应。该模型用于分析从拟南芥早期胚胎的正反交中测得的RNA-seq数据集，这些胚胎根据母本和父本基因组贡献的程度表现出不同的反应，从中鉴定出与母本和父本贡献相关的基因。还进行了模拟研究以验证该模型的统计行为。

结论

该模型是通过RNA-seq对基因表达数据进行聚类的有用工具，从而有助于我们理解基因功能和网络。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种用于识别由环境信号诱导的基因表达差异模式的斯凯拉姆模型。

A skellam model to identify differential patterns of gene expression induced by environmental signals.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

一种用于识别由环境信号诱导的基因表达差异模式的斯凯拉姆模型。

A skellam model to identify differential patterns of gene expression induced by environmental signals.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献