Suppr超能文献

与参考基因组比较转录组分析方法。

A comparison of transcriptome analysis methods with reference genome.

机构信息

Department of Medical Genetics and Developmental Biology, School of Basic Medical Sciences, Capital Medical University, Beijing, China.

Beijing Key Laboratory of Neural Regeneration and Repair, Capital Medical University, Beijing, China.

出版信息

BMC Genomics. 2022 Mar 25;23(1):232. doi: 10.1186/s12864-022-08465-0.

Abstract

BACKGROUND

The application of RNA-seq technology has become more extensive and the number of analysis procedures available has increased over the past years. Selecting an appropriate workflow has become an important issue for researchers in the field.

METHODS

In our study, six popular analytical procedures/pipeline were compared using four RNA-seq datasets from mouse, human, rat, and macaque, respectively. The gene expression value, fold change of gene expression, and statistical significance were evaluated to compare the similarities and differences among the six procedures. qRT-PCR was performed to validate the differentially expressed genes (DEGs) from all six procedures.

RESULTS

Cufflinks-Cuffdiff demands the highest computing resources and Kallisto-Sleuth demands the least. Gene expression values, fold change, p and q values of differential expression (DE) analysis are highly correlated among procedures using HTseq for quantification. For genes with medium expression abundance, the expression values determined using the different procedures were similar. Major differences in expression values come from genes with particularly high or low expression levels. HISAT2-StringTie-Ballgown is more sensitive to genes with low expression levels, while Kallisto-Sleuth may only be useful to evaluate genes with medium to high abundance. When the same thresholds for fold change and p value are chosen in DE analysis, StringTie-Ballgown produce the least number of DEGs, while HTseq-DESeq2, -edgeR or -limma generally produces more DEGs. The performance of Cufflinks-Cuffdiff and Kallisto-Sleuth varies in different datasets. For DEGs with medium expression levels, the biological verification rates were similar among all procedures.

CONCLUSION

Results are highly correlated among RNA-seq analysis procedures using HTseq for quantification. Difference in gene expression values mainly come from genes with particularly high or low expression levels. Moreover, biological validation rates of DEGs from all six procedures were similar for genes with medium expression levels. Investigators can choose analytical procedures according to their available computer resources, or whether genes of high or low expression levels are of interest. If computer resources are abundant, one can utilize multiple procedures to obtain the intersection of results to get the most reliable DEGs, or to obtain a combination of results to get a more comprehensive DE profile for transcriptomes.

摘要

背景

近年来,RNA-seq 技术的应用越来越广泛,可用的分析流程也越来越多。选择合适的工作流程已成为该领域研究人员的重要问题。

方法

在我们的研究中,分别使用来自小鼠、人类、大鼠和猕猴的四个 RNA-seq 数据集,比较了六种流行的分析程序/流程。通过评估基因表达值、基因表达的倍数变化和统计显著性,比较了这六种程序之间的异同。对所有六种程序的差异表达基因(DEG)进行 qRT-PCR 验证。

结果

Cufflinks-Cuffdiff 对计算资源的要求最高,Kallisto-Sleuth 对计算资源的要求最低。使用 HTseq 进行定量分析时,六种程序的基因表达值、倍数变化、差异表达(DE)分析的 p 和 q 值高度相关。对于中等表达丰度的基因,不同程序确定的表达值相似。表达值的主要差异来自表达水平特别高或低的基因。HISAT2-StringTie-Ballgown 对低表达水平的基因更敏感,而 Kallisto-Sleuth 可能仅对中高丰度的基因有用。当在 DE 分析中选择相同的倍数变化和 p 值阈值时,StringTie-Ballgown 产生的 DEG 数量最少,而 HTseq-DESeq2、edgeR 或 limma 通常会产生更多的 DEG。Cufflinks-Cuffdiff 和 Kallisto-Sleuth 在不同的数据集上表现不同。对于中等表达水平的 DEG,所有程序的生物学验证率相似。

结论

使用 HTseq 进行定量分析时,RNA-seq 分析程序的结果高度相关。基因表达值的差异主要来自表达水平特别高或低的基因。此外,对于中等表达水平的 DEG,所有六种程序的生物学验证率相似。研究人员可以根据可用的计算机资源或高或低表达水平的基因是否感兴趣来选择分析程序。如果计算机资源丰富,可以利用多种程序获得结果的交集,以获得最可靠的 DEG,或者获得结果的组合,以获得转录组更全面的 DE 图谱。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c47e/8957167/1ea02dd84617/12864_2022_8465_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验