Suppr超能文献

RNA测序计算工作流程的析因研究将偏差识别为技术基因特征。

Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures.

作者信息

Simoneau Joël, Gosselin Ryan, Scott Michelle S

机构信息

Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada.

Department of Chemical & Biotechnological Engineering, Faculty of Engineering, Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada.

出版信息

NAR Genom Bioinform. 2020 Jun 29;2(2):lqaa043. doi: 10.1093/nargab/lqaa043. eCollection 2020 Jun.

Abstract

RNA-seq is a modular experimental and computational approach aiming in identifying and quantifying RNA molecules. The modularity of the RNA-seq technology enables adaptation of the protocol to develop new ways to explore RNA biology, but this modularity also brings forth the importance of methodological thoroughness. Liberty of approach comes with the responsibility of choices, and such choices must be informed. Here, we present an approach that identifies gene group-specific quantification biases in current RNA-seq software and references by processing datasets using diverse RNA-seq computational pipelines, and by decomposing these expression datasets with an independent component analysis matrix factorization method. By exploring the RNA-seq pipeline using this systemic approach, we identify genome annotations as a design choice that affects to the same extent quantification results as does the choice of aligners and quantifiers. We also show that the different choices in RNA-seq methodology are not independent, identifying interactions between genome annotations and quantification software. Genes were mainly affected by differences in their sequence, by overlapping genes and genes with similar sequence. Our approach offers an explanation for the observed biases by identifying the common features used differently by the software and references, therefore providing leads for the betterment of RNA-seq methodology.

摘要

RNA测序是一种模块化的实验和计算方法,旨在识别和量化RNA分子。RNA测序技术的模块化使得可以调整实验方案,开发探索RNA生物学的新方法,但这种模块化也凸显了方法彻底性的重要性。方法的自由伴随着选择的责任,而且这些选择必须是明智的。在这里,我们提出一种方法,通过使用不同的RNA测序计算流程处理数据集,并通过独立成分分析矩阵分解方法分解这些表达数据集,来识别当前RNA测序软件和参考数据中特定基因组的定量偏差。通过使用这种系统方法探索RNA测序流程,我们发现基因组注释作为一种设计选择,对定量结果的影响程度与比对工具和定量工具的选择相同。我们还表明,RNA测序方法中的不同选择并非相互独立,而是确定了基因组注释与定量软件之间的相互作用。基因主要受到其序列差异、重叠基因以及序列相似基因的影响。我们的方法通过识别软件和参考数据以不同方式使用的共同特征,为观察到的偏差提供了解释,从而为改进RNA测序方法提供了线索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/448dfd00351a/lqaa043fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验