Suppr超能文献

FunPat:基于功能的RNA测序时间序列数据分析模式分析

FunPat: function-based pattern analysis on RNA-seq time series data.

作者信息

Sanavia Tiziana, Finotello Francesca, Di Camillo Barbara

出版信息

BMC Genomics. 2015;16(Suppl 6):S2. doi: 10.1186/1471-2164-16-S6-S2. Epub 2015 Jun 1.

Abstract

BACKGROUND

Dynamic expression data, nowadays obtained using high-throughput RNA sequencing, are essential to monitor transient gene expression changes and to study the dynamics of their transcriptional activity in the cell or response to stimuli. Several methods for data selection, clustering and functional analysis are available; however, these steps are usually performed independently, without exploiting and integrating the information derived from each step of the analysis.

METHODS

Here we present FunPat, an R package for time series RNA sequencing data that integrates gene selection, clustering and functional annotation into a single framework. FunPat exploits functional annotations by performing for each functional term, e.g. a Gene Ontology term, an integrated selection-clustering analysis to select differentially expressed genes that share, besides annotation, a common dynamic expression profile.

RESULTS

FunPat performance was assessed on both simulated and real data. With respect to a stand-alone selection step, the integration of the clustering step is able to improve the recall without altering the false discovery rate. FunPat also shows high precision and recall in detecting the correct temporal expression patterns; in particular, the recall is significantly higher than hierarchical, k-means and a model-based clustering approach specifically designed for RNA sequencing data. Moreover, when biological replicates are missing, FunPat is able to provide reproducible lists of significant genes. The application to real time series expression data shows the ability of FunPat to select differentially expressed genes with high reproducibility, indirectly confirming high precision and recall in gene selection. Moreover, the expression patterns obtained as output allow an easy interpretation of the results.

CONCLUSIONS

A novel analysis pipeline was developed to search the main temporal patterns in classes of genes similarly annotated, improving the sensitivity of gene selection by integrating the statistical evidence of differential expression with the information on temporal profiles and the functional annotations. Significant genes are associated to both the most informative functional terms, avoiding redundancy of information, and the most representative temporal patterns, thus improving the readability of the results. FunPat package is provided in R/Bioconductor at link: http://sysbiobig.dei.unipd.it/?q=node/79.

摘要

背景

如今通过高通量RNA测序获得的动态表达数据对于监测瞬时基因表达变化以及研究其在细胞中的转录活性动态或对刺激的反应至关重要。有几种数据选择、聚类和功能分析的方法;然而,这些步骤通常是独立执行的,没有利用和整合分析每个步骤中获得的信息。

方法

在此我们展示了FunPat,一个用于时间序列RNA测序数据的R包,它将基因选择、聚类和功能注释整合到一个单一框架中。FunPat通过对每个功能术语(例如基因本体术语)执行综合选择 - 聚类分析来利用功能注释,以选择除注释外还共享共同动态表达谱的差异表达基因。

结果

在模拟数据和真实数据上评估了FunPat的性能。相对于单独的选择步骤,聚类步骤的整合能够在不改变错误发现率的情况下提高召回率。FunPat在检测正确的时间表达模式方面也显示出高精度和召回率;特别是,召回率显著高于层次聚类、k均值聚类以及专门为RNA测序数据设计的基于模型的聚类方法。此外,当缺少生物学重复时,FunPat能够提供可重复的显著基因列表。对实时序列表达数据的应用表明FunPat能够以高重现性选择差异表达基因,间接证实了基因选择中的高精度和召回率。此外,作为输出获得的表达模式便于结果的解释。

结论

开发了一种新颖的分析流程来搜索注释相似的基因类别中的主要时间模式,通过将差异表达的统计证据与时间谱信息和功能注释相结合来提高基因选择的敏感性。显著基因与最具信息性的功能术语以及最具代表性的时间模式相关联,从而避免了信息冗余,提高了结果的可读性。FunPat包可在R/Bioconductor中通过以下链接获取:http://sysbiobig.dei.unipd.it/?q=node/79

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/911c/4460925/f7d65da8d8a2/1471-2164-16-S6-S2-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验