RNA测序属性对从头组装转录组差异表达分析中假阳性率的影响。 - Suppr | 超能文献

RNA测序属性对从头组装转录组差异表达分析中假阳性率的影响。

Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes.

作者信息

González Emmanuel, Joly Simon

机构信息

Institut de recherche en biologie végétale, Université de Montréal, 4101 Sherbrooke E, Montréal, H1X 2B2, (QC), Canada.

出版信息

BMC Res Notes. 2013 Dec 3;6:503. doi: 10.1186/1756-0500-6-503.

DOI:10.1186/1756-0500-6-503

PMID:24298906

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4222115/

Abstract

BACKGROUND

High-throughput RNA sequencing studies are becoming increasingly popular and differential expression studies represent an important downstream analysis that often follow de novo transcriptome assembly. If a lot of attention has been given to bioinformatics tools for differential gene expression, little has yet been given to the impact of the sequence data itself used in pipelines.

RESULTS

We tested how using different types of reads from the ones used to assemble a de novo transcriptome (both differing in length and pairing attributes) could potentially affect differential expression (DE) results. To investigate this, we created artificial datasets out of long paired-end RNA-seq datasets initially used to build the assembly. All datasets were compared via DE analyses and because all samples come from the same sequencing run, DE of genes or isoforms can be interpreted as false positives resulting from sequence attributes. If the false positive rate for differential gene expression does not seem to be strongly affected by sequencing strategy (max. of 3.5%), it could reach 12.2% or 28.1% for differential isoform expression depending of the pipeline used. The effect of paired-end vs. single-end strategy was found to have a much greater impact in terms of false positives than sequence length.

CONCLUSION

In light of false positive rate results, we recommend using paired-end over single-end sequences in differential expression studies, even if the impact is less serious for differential gene expression.

摘要

背景

高通量RNA测序研究日益普及，差异表达研究是一种重要的下游分析，通常在从头转录组组装之后进行。尽管人们对用于差异基因表达的生物信息学工具给予了很多关注，但对于流程中使用的序列数据本身的影响却关注甚少。

结果

我们测试了使用与用于组装从头转录组的reads不同类型的reads（长度和配对属性均不同）如何可能影响差异表达（DE）结果。为了研究这一点，我们从最初用于构建组装的长配对末端RNA-seq数据集中创建了人工数据集。通过DE分析比较所有数据集，由于所有样本都来自同一次测序运行，基因或异构体的DE可解释为由序列属性导致的假阳性。如果差异基因表达的假阳性率似乎不受测序策略的强烈影响（最高为3.5%），那么根据所使用的流程，差异异构体表达的假阳性率可能达到12.2%或28.1%。发现配对末端与单末端策略的影响在假阳性方面比序列长度的影响大得多。

结论

根据假阳性率结果，我们建议在差异表达研究中使用配对末端序列而非单末端序列，即使对于差异基因表达影响较小。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

RNA测序属性对从头组装转录组差异表达分析中假阳性率的影响。

Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

RNA测序属性对从头组装转录组差异表达分析中假阳性率的影响。

Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献