Suppr超能文献

常用 RNA-seq 分析流程中基因表达估计的变异性。

Variability in estimated gene expression among commonly used RNA-seq pipelines.

机构信息

Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA.

出版信息

Sci Rep. 2020 Feb 17;10(1):2734. doi: 10.1038/s41598-020-59516-z.

Abstract

RNA-sequencing data is widely used to identify disease biomarkers and therapeutic targets using numerical methods such as clustering, classification, regression, and differential expression analysis. Such approaches rely on the assumption that mRNA abundance estimates from RNA-seq are reliable estimates of true expression levels. Here, using data from five RNA-seq processing pipelines applied to 6,690 human tumor and normal tissues, we show that nearly 88% of protein-coding genes have similar gene expression profiles across all pipelines. However, for >12% of protein-coding genes, current best-in-class RNA-seq processing pipelines differ in their abundance estimates by more than four-fold when applied to exactly the same samples and the same set of RNA-seq reads. Expression fold changes are similarly affected. Many of the impacted genes are widely studied disease-associated genes. We show that impacted genes exhibit diverse patterns of discordance among pipelines, suggesting that many inter-pipeline differences contribute to overall uncertainty in mRNA abundance estimates. A concerted, community-wide effort will be needed to develop gold-standards for estimating the mRNA abundance of the discordant genes reported here. In the meantime, our list of discordantly evaluated genes provides an important resource for robust marker discovery and target selection.

摘要

RNA 测序数据被广泛用于通过聚类、分类、回归和差异表达分析等数值方法来识别疾病生物标志物和治疗靶点。这些方法依赖于这样一个假设,即 RNA-seq 中 mRNA 丰度的估计是真实表达水平的可靠估计。在这里,我们使用来自 5 个 RNA-seq 处理管道的数据,这些数据应用于 6690 个人类肿瘤和正常组织,结果表明,几乎 88%的蛋白质编码基因在所有管道中具有相似的基因表达谱。然而,对于 >12%的蛋白质编码基因,当应用于完全相同的样本和相同的 RNA-seq 读取集时,当前最佳的 RNA-seq 处理管道在其丰度估计上的差异超过四倍。表达倍数变化也受到类似的影响。许多受影响的基因是广泛研究的疾病相关基因。我们表明,受影响的基因在管道之间表现出不同的不一致模式,这表明许多管道之间的差异导致了 mRNA 丰度估计的整体不确定性。需要进行协同的、全行业的努力,为这里报告的不一致基因的 mRNA 丰度估计开发黄金标准。在此期间,我们列出的不一致评估基因提供了一个重要的资源,用于稳健的标记物发现和靶标选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a207/7026138/a5c3bd280275/41598_2020_59516_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验