体外转录测序（IVT-seq）揭示了RNA测序中的极端偏差。

IVT-seq reveals extreme bias in RNA sequencing.

作者信息

Lahens Nicholas F, Kavakli Ibrahim Halil, Zhang Ray, Hayer Katharina, Black Michael B, Dueck Hannah, Pizarro Angel, Kim Junhyong, Irizarry Rafael, Thomas Russell S, Grant Gregory R, Hogenesch John B

出版信息

Genome Biol. 2014 Jun 30;15(6):R86. doi: 10.1186/gb-2014-15-6-r86.

DOI:10.1186/gb-2014-15-6-r86

PMID:24981968

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4197826/

Abstract

BACKGROUND

RNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value.

RESULTS

We present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation.

CONCLUSIONS

These results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results.

摘要

背景

RNA测序是一种用于识别和定量已知及新的转录和剪接事件的强大技术。然而，鉴于其近期的发展以及文库构建方法的激增，对其所引入偏差的理解并不完整，但对于实现其价值至关重要。

结果

我们提出了一种体外转录测序（IVT-seq）方法，用于大规模识别和评估RNA测序文库生成及测序过程中的技术偏差。我们从一个全长人类cDNA文库中创建了一个包含1000多个体外转录RNA的文库，并使用最常见的方案，即聚腺苷酸化RNA测序和总RNA测序对它们进行测序。由于每个cDNA都是全长的，并且我们证明体外转录具有极高的持续性，每个转录本中的每个碱基都应得到等效的呈现。然而，对于常见的RNA测序应用和平台，我们发现50%的转录本在转录本内序列覆盖度上有两倍以上的差异，10%的转录本有十倍以上的差异。我们还发现超过6%的转录本在样本间具有显著不可预测的测序覆盖区域，这混淆了对其表达的准确测定。我们结合实验和计算方法表明，核糖体RNA去除是覆盖度最大变异性的原因，并且几个序列决定因素也强烈影响呈现情况。

结论

这些结果表明IVT-seq有助于更好地理解RNA测序所引入的偏差。我们发现核糖体RNA去除是文库制备过程中引入的覆盖度方面大量未被认识到的偏差的原因。这些偏差表明外显子水平的表达分析可能不可取，并且我们建议在解释RNA测序结果时要谨慎。

相似文献

IVT-seq reveals extreme bias in RNA sequencing.体外转录测序（IVT-seq）揭示了RNA测序中的极端偏差。

Genome Biol. 2014 Jun 30;15(6):R86. doi: 10.1186/gb-2014-15-6-r86.

Depletion of Ribosomal RNA Sequences from Single-Cell RNA-Sequencing Library.从单细胞RNA测序文库中去除核糖体RNA序列

Curr Protoc Mol Biol. 2016 Jul 1;115:7.27.1-7.27.20. doi: 10.1002/cpmb.11.

Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries.来自聚腺苷酸选择和核糖体RNA去除的RNA测序文库的定量表达估计的整合。

BMC Bioinformatics. 2017 Jun 13;18(1):301. doi: 10.1186/s12859-017-1714-9.

Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion.评价两种主要的 RNA-seq 方法在临床 RNA 测序中用于基因定量的效果：polyA+ 选择与 rRNA 耗尽。

Sci Rep. 2018 Mar 19;8(1):4781. doi: 10.1038/s41598-018-23226-4.

Comparison of RNA-Seq by poly (A) capture, ribosomal RNA depletion, and DNA microarray for expression profiling.多聚（A）捕获、核糖体 RNA 耗尽和 DNA 微阵列在表达谱分析方面的比较。

BMC Genomics. 2014 Jun 2;15(1):419. doi: 10.1186/1471-2164-15-419.

Selective ablation of 3' RNA ends and processive RTs facilitate direct cDNA sequencing of full-length host cell and viral transcripts.选择性地切除 3' RNA 末端和连续 RT 有助于全长宿主细胞和病毒转录本的直接 cDNA 测序。

Nucleic Acids Res. 2022 Sep 23;50(17):e98. doi: 10.1093/nar/gkac516.

Library preparation methods for next-generation sequencing: tone down the bias.下一代测序文库制备方法：减少偏倚。

Exp Cell Res. 2014 Mar 10;322(1):12-20. doi: 10.1016/j.yexcr.2014.01.008. Epub 2014 Jan 15.

Synthetic spike-in standards for RNA-seq experiments.用于 RNA-seq 实验的合成 Spike-in 标准品。

Genome Res. 2011 Sep;21(9):1543-51. doi: 10.1101/gr.121095.111. Epub 2011 Aug 4.

A low-cost library construction protocol and data analysis pipeline for Illumina-based strand-specific multiplex RNA-seq.基于 Illumina 的链特异性多重 RNA-seq 的低成本文库构建方案和数据分析流程。

PLoS One. 2011;6(10):e26426. doi: 10.1371/journal.pone.0026426. Epub 2011 Oct 19.

Comparison of library preparation methods reveals their impact on interpretation of metatranscriptomic data.文库制备方法的比较揭示了它们对宏转录组数据解读的影响。

BMC Genomics. 2014 Oct 20;15(1):912. doi: 10.1186/1471-2164-15-912.

引用本文的文献

Programmable RNA Nanostructures Enable Nanopore Detection of Cotranscriptionally Introduced RNA Modifications.可编程RNA纳米结构实现了对共转录引入的RNA修饰的纳米孔检测。

Nano Lett. 2025 Aug 13;25(32):12184-12192. doi: 10.1021/acs.nanolett.5c02391. Epub 2025 Aug 4.

Sources of non-uniform coverage in short-read RNA-Seq data.短读长RNA测序数据中覆盖度不均匀的来源。

bioRxiv. 2025 Feb 6:2025.01.30.634337. doi: 10.1101/2025.01.30.634337.

A computational model for bacteriophage ϕX174 gene expression.噬菌体 ϕX174 基因表达的计算模型。

PLoS One. 2024 Oct 31;19(10):e0313039. doi: 10.1371/journal.pone.0313039. eCollection 2024.

NERD-seq: a novel approach of Nanopore direct RNA sequencing that expands representation of non-coding RNAs.NERD-seq：一种新型的纳米孔直接 RNA 测序方法，可扩展非编码 RNA 的代表性。

Genome Biol. 2024 Aug 28;25(1):233. doi: 10.1186/s13059-024-03375-8.

Viral genome sequencing methods: benefits and pitfalls of current approaches.病毒基因组测序方法：当前方法的优缺点。

Biochem Soc Trans. 2024 Jun 26;52(3):1431-1447. doi: 10.1042/BST20231322.

Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets.用于解决单细胞转录组数据集分析中挑战的数据标准化。

BMC Genomics. 2024 May 6;25(1):444. doi: 10.1186/s12864-024-10364-5.

GAiN: An integrative tool utilizing generative adversarial neural networks for augmented gene expression analysis.GAiN：一种利用生成对抗神经网络进行增强基因表达分析的综合工具。

Patterns (N Y). 2024 Jan 8;5(2):100910. doi: 10.1016/j.patter.2023.100910. eCollection 2024 Feb 9.

DiffSegR: an RNA-seq data driven method for differential expression analysis using changepoint detection.DiffSegR：一种基于RNA测序数据，利用变点检测进行差异表达分析的方法。

NAR Genom Bioinform. 2023 Nov 6;5(4):lqad098. doi: 10.1093/nargab/lqad098. eCollection 2023 Dec.

ChimeraTE: a pipeline to detect chimeric transcripts derived from genes and transposable elements.ChimeraTE：一种用于检测源自基因和转座子的嵌合转录本的管道。

Nucleic Acids Res. 2023 Oct 13;51(18):9764-9784. doi: 10.1093/nar/gkad671.

Transcriptomics for Clinical and Experimental Biology Research: Hang on a Seq.临床与实验生物学研究中的转录组学：紧跟测序技术发展

Adv Genet (Hoboken). 2023 Jan 17;4(2):2200024. doi: 10.1002/ggn2.202200024. eCollection 2023 Jun.

本文引用的文献

Comparative analysis of RNA sequencing methods for degraded or low-input samples.用于降解或低输入样本的 RNA 测序方法的比较分析。

Nat Methods. 2013 Jul;10(7):623-9. doi: 10.1038/nmeth.2483. Epub 2013 May 19.

An update on recent methods applied for deciphering the diversity of the noncoding RNA genome structure and function.解析非编码 RNA 基因组结构和功能多样性的最新方法综述。

Methods. 2013 Sep 1;63(1):3-17. doi: 10.1016/j.ymeth.2013.04.003. Epub 2013 Apr 15.

Predicting long non-coding RNAs using RNA sequencing.利用 RNA 测序预测长非编码 RNA。

Methods. 2013 Sep 1;63(1):50-9. doi: 10.1016/j.ymeth.2013.03.019. Epub 2013 Mar 27.

RNA editing in the human ENCODE RNA-seq data.人类 ENCODE RNA-seq 数据中的 RNA 编辑。

Genome Res. 2012 Sep;22(9):1626-33. doi: 10.1101/gr.134957.111.

Summarizing and correcting the GC content bias in high-throughput sequencing.高通量测序中 GC 含量偏倚的总结与校正。

Nucleic Acids Res. 2012 May;40(10):e72. doi: 10.1093/nar/gks001. Epub 2012 Feb 9.

Synthetic spike-in standards for RNA-seq experiments.用于 RNA-seq 实验的合成 Spike-in 标准品。

Genome Res. 2011 Sep;21(9):1543-51. doi: 10.1101/gr.121095.111. Epub 2011 Aug 4.

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).RNA-Seq 比对算法与 RNA-Seq 统一映射器（RUM）的比较分析。

Bioinformatics. 2011 Sep 15;27(18):2518-28. doi: 10.1093/bioinformatics/btr427. Epub 2011 Jul 19.

Sequence-specific error profile of Illumina sequencers.Illumina 测序仪的序列特异性错误特征。

Nucleic Acids Res. 2011 Jul;39(13):e90. doi: 10.1093/nar/gkr344. Epub 2011 May 16.

Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries.分析并最小化 Illumina 测序文库中的 PCR 扩增偏倚。

Genome Biol. 2011;12(2):R18. doi: 10.1186/gb-2011-12-2-r18. Epub 2011 Feb 21.

Accurate quantification of transcriptome from RNA-Seq data by effective length normalization.通过有效长度归一化对 RNA-Seq 数据进行转录组的精确定量。

Nucleic Acids Res. 2011 Jan;39(2):e9. doi: 10.1093/nar/gkq1015. Epub 2010 Nov 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

体外转录测序（IVT-seq）揭示了RNA测序中的极端偏差。

IVT-seq reveals extreme bias in RNA sequencing.

作者信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献