• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RNA测序计算工作流程的析因研究将偏差识别为技术基因特征。

Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures.

作者信息

Simoneau Joël, Gosselin Ryan, Scott Michelle S

机构信息

Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada.

Department of Chemical & Biotechnological Engineering, Faculty of Engineering, Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada.

出版信息

NAR Genom Bioinform. 2020 Jun 29;2(2):lqaa043. doi: 10.1093/nargab/lqaa043. eCollection 2020 Jun.

DOI:10.1093/nargab/lqaa043
PMID:33575596
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7671328/
Abstract

RNA-seq is a modular experimental and computational approach aiming in identifying and quantifying RNA molecules. The modularity of the RNA-seq technology enables adaptation of the protocol to develop new ways to explore RNA biology, but this modularity also brings forth the importance of methodological thoroughness. Liberty of approach comes with the responsibility of choices, and such choices must be informed. Here, we present an approach that identifies gene group-specific quantification biases in current RNA-seq software and references by processing datasets using diverse RNA-seq computational pipelines, and by decomposing these expression datasets with an independent component analysis matrix factorization method. By exploring the RNA-seq pipeline using this systemic approach, we identify genome annotations as a design choice that affects to the same extent quantification results as does the choice of aligners and quantifiers. We also show that the different choices in RNA-seq methodology are not independent, identifying interactions between genome annotations and quantification software. Genes were mainly affected by differences in their sequence, by overlapping genes and genes with similar sequence. Our approach offers an explanation for the observed biases by identifying the common features used differently by the software and references, therefore providing leads for the betterment of RNA-seq methodology.

摘要

RNA测序是一种模块化的实验和计算方法,旨在识别和量化RNA分子。RNA测序技术的模块化使得可以调整实验方案,开发探索RNA生物学的新方法,但这种模块化也凸显了方法彻底性的重要性。方法的自由伴随着选择的责任,而且这些选择必须是明智的。在这里,我们提出一种方法,通过使用不同的RNA测序计算流程处理数据集,并通过独立成分分析矩阵分解方法分解这些表达数据集,来识别当前RNA测序软件和参考数据中特定基因组的定量偏差。通过使用这种系统方法探索RNA测序流程,我们发现基因组注释作为一种设计选择,对定量结果的影响程度与比对工具和定量工具的选择相同。我们还表明,RNA测序方法中的不同选择并非相互独立,而是确定了基因组注释与定量软件之间的相互作用。基因主要受到其序列差异、重叠基因以及序列相似基因的影响。我们的方法通过识别软件和参考数据以不同方式使用的共同特征,为观察到的偏差提供了解释,从而为改进RNA测序方法提供了线索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/41a747fb4b3b/lqaa043fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/448dfd00351a/lqaa043fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/47e88acf7ff0/lqaa043fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/3a8427fb4ae3/lqaa043fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/0f806466fba7/lqaa043fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/41a747fb4b3b/lqaa043fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/448dfd00351a/lqaa043fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/47e88acf7ff0/lqaa043fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/3a8427fb4ae3/lqaa043fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/0f806466fba7/lqaa043fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/7671328/41a747fb4b3b/lqaa043fig5.jpg

相似文献

1
Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures.RNA测序计算工作流程的析因研究将偏差识别为技术基因特征。
NAR Genom Bioinform. 2020 Jun 29;2(2):lqaa043. doi: 10.1093/nargab/lqaa043. eCollection 2020 Jun.
2
Current RNA-seq methodology reporting limits reproducibility.当前的 RNA-seq 方法学报告限制了可重复性。
Brief Bioinform. 2021 Jan 18;22(1):140-145. doi: 10.1093/bib/bbz124.
3
SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines.SimBA:一种用于评估RNA测序生物信息学流程性能的方法和工具。
BMC Bioinformatics. 2017 Sep 29;18(1):428. doi: 10.1186/s12859-017-1831-5.
4
The Selection of Quantification Pipelines for Illumina RNA-seq Data Using a Subsampling Approach.使用二次抽样方法对Illumina RNA测序数据进行定量分析流程的选择
IEEE EMBS Int Conf Biomed Health Inform. 2016 Feb;2016:78-81. doi: 10.1109/BHI.2016.7455839.
5
VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis.VIPER:RNA-seq 可视化管道,一个 Snakemake 工作流程,用于高效完整的 RNA-seq 分析。
BMC Bioinformatics. 2018 Apr 12;19(1):135. doi: 10.1186/s12859-018-2139-9.
6
SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.SPARTA:用于基于参考的细菌RNA测序转录组自动分析的简单程序。
BMC Bioinformatics. 2016 Feb 4;17:66. doi: 10.1186/s12859-016-0923-y.
7
ChimPipe: accurate detection of fusion genes and transcription-induced chimeras from RNA-seq data.ChimPipe:从RNA测序数据中准确检测融合基因和转录诱导嵌合体。
BMC Genomics. 2017 Jan 3;18(1):7. doi: 10.1186/s12864-016-3404-9.
8
4C-ker: A Method to Reproducibly Identify Genome-Wide Interactions Captured by 4C-Seq Experiments.4C-ker:一种可重复鉴定由4C-Seq实验捕获的全基因组相互作用的方法。
PLoS Comput Biol. 2016 Mar 3;12(3):e1004780. doi: 10.1371/journal.pcbi.1004780. eCollection 2016 Mar.
9
A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification.在RNA测序读段映射和基因定量的背景下,对Ensembl、RefSeq和UCSC注释进行全面评估。
BMC Genomics. 2015 Feb 18;16(1):97. doi: 10.1186/s12864-015-1308-8.
10
EPIG-Seq: extracting patterns and identifying co-expressed genes from RNA-Seq data.EPIG-Seq:从RNA测序数据中提取模式并识别共表达基因。
BMC Genomics. 2016 Mar 22;17:255. doi: 10.1186/s12864-016-2584-7.

引用本文的文献

1
Computational Comparison of Differential Splicing Tools for Targeted RNA Long-Amplicon Sequencing (rLAS).用于靶向RNA长扩增子测序(rLAS)的差异剪接工具的计算比较
Int J Mol Sci. 2025 Mar 30;26(7):3220. doi: 10.3390/ijms26073220.
2
RNA-seq data science: From raw data to effective interpretation.RNA测序数据科学:从原始数据到有效解读
Front Genet. 2023 Mar 13;14:997383. doi: 10.3389/fgene.2023.997383. eCollection 2023.
3
Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers.使用生物信息学工作流管理器的可重复、可扩展且可共享的分析管道。

本文引用的文献

1
Current RNA-seq methodology reporting limits reproducibility.当前的 RNA-seq 方法学报告限制了可重复性。
Brief Bioinform. 2021 Jan 18;22(1):140-145. doi: 10.1093/bib/bbz124.
2
Deconvolution of transcriptomes and miRNomes by independent component analysis provides insights into biological processes and clinical outcomes of melanoma patients.通过独立成分分析对转录组和 miRNA 组进行反卷积,为黑色素瘤患者的生物学过程和临床结果提供了深入了解。
BMC Med Genomics. 2019 Sep 18;12(1):132. doi: 10.1186/s12920-019-0578-4.
3
Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets.
Nat Methods. 2021 Oct;18(10):1161-1168. doi: 10.1038/s41592-021-01254-9. Epub 2021 Sep 23.
4
RNAflow: An Effective and Simple RNA-Seq Differential Gene Expression Pipeline Using Nextflow.RNAflow:一种使用 Nextflow 的高效、简单的 RNA-Seq 差异基因表达分析流程。
Genes (Basel). 2020 Dec 10;11(12):1487. doi: 10.3390/genes11121487.
5
OpenProt 2021: deeper functional annotation of the coding potential of eukaryotic genomes.OpenProt 2021:深入注释真核生物基因组的编码潜能。
Nucleic Acids Res. 2021 Jan 8;49(D1):D380-D388. doi: 10.1093/nar/gkaa1036.
独立成分分析在癌症组学数据集复杂性研究中的应用
Int J Mol Sci. 2019 Sep 7;20(18):4414. doi: 10.3390/ijms20184414.
4
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.基于图的基因组比对和基因分型与 HISAT2 和 HISAT-genotype。
Nat Biotechnol. 2019 Aug;37(8):907-915. doi: 10.1038/s41587-019-0201-4. Epub 2019 Aug 2.
5
Essential guidelines for computational method benchmarking.计算方法基准测试的基本指南。
Genome Biol. 2019 Jun 20;20(1):125. doi: 10.1186/s13059-019-1738-8.
6
In silico analysis of RNA-seq requires a more complete description of methodology.RNA测序的计算机分析需要对方法进行更完整的描述。
Nat Rev Mol Cell Biol. 2019 Aug;20(8):451-452. doi: 10.1038/s41580-019-0137-z.
7
Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance.对样本数量和读取深度对 RNA-Seq 分析工作流程性能的影响进行实证评估。
BMC Bioinformatics. 2018 Nov 14;19(1):423. doi: 10.1186/s12859-018-2445-2.
8
Ensembl 2019.Ensembl 2019.
Nucleic Acids Res. 2019 Jan 8;47(D1):D745-D751. doi: 10.1093/nar/gky1113.
9
GENCODE reference annotation for the human and mouse genomes.GENCODE 人类和小鼠基因组参考注释。
Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773. doi: 10.1093/nar/gky955.
10
OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes.OpenProt:探索真核生物编码潜能和蛋白质组的更全面指南。
Nucleic Acids Res. 2019 Jan 8;47(D1):D403-D410. doi: 10.1093/nar/gky936.