Suppr超能文献

一种标准化RNA测序数据的综合方法。

An integrative method to normalize RNA-Seq data.

作者信息

Filloux Cyril, Cédric Meersseman, Romain Philippe, Lionel Forestier, Christophe Klopp, Dominique Rocha, Abderrahman Maftah, Daniel Petit

机构信息

INRA, UMR1061, Unité Génétique Moléculaire Animale, 123 avenue Albert Thomas, F-87060 Limoges Cedex, France.

出版信息

BMC Bioinformatics. 2014 Jun 14;15:188. doi: 10.1186/1471-2105-15-188.

Abstract

BACKGROUND

Transcriptome sequencing is a powerful tool for measuring gene expression, but as well as some other technologies, various artifacts and biases affect the quantification. In order to correct some of them, several normalization approaches have emerged, differing both in the statistical strategy employed and in the type of corrected biases. However, there is no clear standard normalization method.

RESULTS

We present a novel methodology to normalize RNA-Seq data, taking into account transcript size, GC content, and sequencing depth, which are the major quantification-related biases. In this study, we found that transcripts shorter than 600 bp have an underestimated expression level, while longer transcripts are even more overestimated that they are long. Second, it was well known that the higher the GC content (>50%), the more the transcripts are underestimated. Third, we demonstrated that the sequencing depth impacts the size bias and proposed a correction allowing the comparison of expression levels among many samples. The efficiency of our approach was then tested by comparing the correlation between normalized RNA-Seq data and qRT-PCR expression measurements. All the steps are automated in a program written in Perl and available on request.

CONCLUSIONS

The methodology presented in this article identifies and corrects different biases that influence RNA-Seq quantification, and provides more accurate estimations of gene expression levels. This method can be applied to compare expression quantifications from many samples, but preferentially from the same tissue. In order to compare samples from different tissue, a calibration using several reference genes will be required.

摘要

背景

转录组测序是测量基因表达的强大工具,但与其他一些技术一样,各种人为因素和偏差会影响定量分析。为了纠正其中一些问题,出现了几种标准化方法,它们在采用的统计策略和校正偏差的类型上都有所不同。然而,目前尚无明确的标准标准化方法。

结果

我们提出了一种新的方法来标准化RNA测序数据,该方法考虑了转录本大小、GC含量和测序深度,这些是与定量分析相关的主要偏差。在本研究中,我们发现长度小于600bp的转录本表达水平被低估,而较长的转录本被高估得更严重。其次,众所周知,GC含量越高(>50%),转录本被低估得越多。第三,我们证明了测序深度会影响大小偏差,并提出了一种校正方法,可用于比较多个样本之间的表达水平。然后,通过比较标准化RNA测序数据与qRT-PCR表达测量之间的相关性,测试了我们方法的效率。所有步骤都在一个用Perl编写的程序中自动执行,可根据要求提供。

结论

本文提出的方法识别并纠正了影响RNA测序定量分析的不同偏差,并提供了更准确的基因表达水平估计。该方法可用于比较多个样本的表达定量分析,但优先用于来自同一组织的样本。为了比较来自不同组织的样本,将需要使用几个参考基因进行校准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71d4/4067528/0a33496aa56c/1471-2105-15-188-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验