de Sá Pablo H C G, Veras Adonney A O, Carneiro Adriana R, Pinheiro Kenny C, Pinto Anne C, Soares Siomar C, Schneider Maria P C, Azevedo Vasco, Silva Artur, Ramos Rommel T J
Institute of Biological Sciences, Federal University Pará, Belém, Pará, Brazil.
Institute of Biological Sciences, Federal University Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
Gene. 2015 Jun 1;563(2):165-71. doi: 10.1016/j.gene.2015.03.033. Epub 2015 Mar 18.
With the emergence of large-scale sequencing platforms since 2005, there has been a great revolution regarding methods for decoding DNA sequences, which have also affected quantitative and qualitative gene expression analyses through the RNA-Sequencing technique. However, issues related to the amount of data required for the analyses have been considered because they affect the reliability of the experiments. Thus, RNA depletion during sample preparation may influence the results. Moreover, because data produced by these platforms show variations in quality, quality filters are often used to remove sequences likely to contain errors to increase the accuracy of the results. However, when reads of quality filters are removed, the expression profile in RNA-Seq experiments may be influenced.
The present study aimed to analyze the impact of different quality filter values for Corynebacterium pseudotuberculosis (sequenced by SOLiD platform), Microcystis aeruginosa and Kineococcus radiotolerans (sequenced by Illumina platform) RNA-Seq data. Although up to 47.9% of the reads produced by the SOLiD technology were removed after the QV20 quality filter is applied, and 15.85% were removed from K. radiotolerans data set using the QV30 filter, Illumina data showed the largest number of unique differentially expressed genes after applying the most stringent filter (QV30), with 69 genes. In contrast, for SOLiD, the acid stress condition with the QV20 filter yielded only 41 unique differentially expressed genes. Even for the highest quality M. aeruginosa data, the quality filter affected the expression profile. The most stringent quality filter generated a greater number of unique differentially expressed genes: 9 for high molecular weight dissolved organic matter condition and 12 for low P conditions.
Even high-accuracy sequencing technologies are subject to the influence of quality filters when evaluating RNA-Seq data using the reference approach.
自2005年大规模测序平台出现以来,DNA序列解码方法发生了巨大变革,这也通过RNA测序技术影响了基因表达的定量和定性分析。然而,分析所需数据量相关的问题一直受到关注,因为它们会影响实验的可靠性。因此,样品制备过程中的RNA去除可能会影响结果。此外,由于这些平台产生的数据质量存在差异,质量过滤器常被用于去除可能包含错误的序列,以提高结果的准确性。然而,当去除质量过滤器的读数时,RNA测序实验中的表达谱可能会受到影响。
本研究旨在分析不同质量过滤值对伪结核棒状杆菌(由SOLiD平台测序)、铜绿微囊藻和耐辐射动球菌(由Illumina平台测序)RNA测序数据的影响。尽管在应用QV20质量过滤器后,SOLiD技术产生的读数中高达47.9%被去除,使用QV30过滤器从耐辐射动球菌数据集中去除了15.85%,但Illumina数据在应用最严格的过滤器(QV30)后显示出数量最多的独特差异表达基因,有69个。相比之下,对于SOLiD,使用QV20过滤器的酸胁迫条件仅产生41个独特的差异表达基因。即使对于质量最高的铜绿微囊藻数据,质量过滤器也影响了表达谱。最严格的质量过滤器产生了更多独特的差异表达基因:在高分子量溶解有机物条件下有9个,在低磷条件下有12个。
在使用参考方法评估RNA测序数据时,即使是高精度测序技术也会受到质量过滤器的影响。