• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

报告 RNA 测序深度时使用映射外显子非重复读的理由:来自儿科癌症数据集的实例。

The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets.

机构信息

UC Santa Cruz, Molecular, Cell and Developmental Biology, 1156 High Street, Santa Cruz, CA 95064, USA.

UC Santa Cruz, Genomics Institute, 1156 High Street, Santa Cruz, CA 95064, USA.

出版信息

Gigascience. 2021 Mar 13;10(3). doi: 10.1093/gigascience/giab011.

DOI:10.1093/gigascience/giab011
PMID:33712853
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7955155/
Abstract

BACKGROUND

The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis.

FINDINGS

In bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1-77% of all reads (median [IQR], 3% [3-6%]); duplicate reads constitute 3-100% of mapped reads (median [IQR], 27% [13-43%]); and non-exonic reads constitute 4-97% of mapped, non-duplicate reads (median [IQR], 25% [16-37%]). MEND reads constitute 0-79% of total reads (median [IQR], 50% [30-61%]).

CONCLUSIONS

Because not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth.

摘要

背景

通过 RNA 测序(RNA-Seq)测量的基因表达的可重复性取决于测序深度。虽然未映射或非外显子的读取对基因表达定量没有贡献,但重复读取对定量有贡献,但对可重复性没有信息。我们表明,映射的外显子非重复(MEND)读取是用于基因表达分析的 RNA-Seq 数据集可重复性的有用度量。

发现

在来自 48 个队列的 2179 个肿瘤的批量 RNA-Seq 数据集中,有助于基因表达分析可重复性的读取比例差异很大。未映射的读取构成所有读取的 1-77%(中位数 [IQR],3% [3-6%]);重复的读取构成映射读取的 3-100%(中位数 [IQR],27% [13-43%]);非外显子的读取构成映射的非重复读取的 4-97%(中位数 [IQR],25% [16-37%])。MEND 读取构成总读取的 0-79%(中位数 [IQR],50% [30-61%])。

结论

由于 RNA-Seq 数据集中的并非所有读取对于基因表达测量的可重复性都是信息丰富的,并且信息丰富的读取比例也不同,因此我们建议报告数据集的测序深度以 MEND 读取,这可以明确反映基因表达的可重复性,而不是总读取,映射读取或外显子读取。我们提供了一个包含(i)现有必需工具(RSeQC、sambamba 和 samblaster)和(ii)从 RNA-Seq 数据文件计算 MEND 读取的自定义脚本的 Docker 映像。我们建议所有 RNA-Seq 基因表达实验、灵敏度研究和深度推荐使用 MEND 单位进行测序深度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a74/7955155/269b48627723/giab011fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a74/7955155/8d88be3bca1d/giab011fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a74/7955155/579a66fb505f/giab011fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a74/7955155/4fef785ca302/giab011fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a74/7955155/269b48627723/giab011fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a74/7955155/8d88be3bca1d/giab011fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a74/7955155/579a66fb505f/giab011fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a74/7955155/4fef785ca302/giab011fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a74/7955155/269b48627723/giab011fig4.jpg

相似文献

1
The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets.报告 RNA 测序深度时使用映射外显子非重复读的理由:来自儿科癌症数据集的实例。
Gigascience. 2021 Mar 13;10(3). doi: 10.1093/gigascience/giab011.
2
Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.通过纳入非外显子映射读数对RNA测序数据进行差异表达分析。
BMC Genomics. 2015;16 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2164-16-S7-S14. Epub 2015 Jun 11.
3
Another lesson from unmapped reads: in-depth analysis of RNA-Seq reads from various horse tissues.另一个来自未映射reads 的教训:对来自不同马组织的 RNA-Seq reads 的深度分析。
J Appl Genet. 2022 Sep;63(3):571-581. doi: 10.1007/s13353-022-00705-z. Epub 2022 Jun 7.
4
Baiting out a full length sequence from unmapped RNA-seq data.从未映射的 RNA-seq 数据中钓出全长序列。
BMC Genomics. 2021 Nov 27;22(1):857. doi: 10.1186/s12864-021-08146-4.
5
CLIP-seq analysis of multi-mapped reads discovers novel functional RNA regulatory sites in the human transcriptome.多映射读段的CLIP-seq分析揭示了人类转录组中的新型功能性RNA调控位点。
Nucleic Acids Res. 2017 Sep 19;45(16):9260-9271. doi: 10.1093/nar/gkx646.
6
Detrimental effects of duplicate reads and low complexity regions on RNA- and ChIP-seq data.重复读数和低复杂度区域对RNA测序和染色质免疫沉淀测序数据的有害影响。
BMC Bioinformatics. 2015;16 Suppl 13(Suppl 13):S10. doi: 10.1186/1471-2105-16-S13-S10. Epub 2015 Sep 25.
7
dupRadar: a Bioconductor package for the assessment of PCR artifacts in RNA-Seq data.dupRadar:一个用于评估RNA测序数据中PCR假象的Bioconductor软件包。
BMC Bioinformatics. 2016 Oct 21;17(1):428. doi: 10.1186/s12859-016-1276-2.
8
Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens.鸡的转录组 RNA-Seq 覆盖度和深度评估。
BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2105-12-S10-S5.
9
A comprehensive next generation sequencing-based virome assessment in brain tissue suggests no major virus - tumor association.对脑组织进行全面的下一代测序病毒组评估表明,没有主要的病毒-肿瘤关联。
Acta Neuropathol Commun. 2016 Jul 11;4(1):71. doi: 10.1186/s40478-016-0338-z.
10
Comparison of RNA-Seq by poly (A) capture, ribosomal RNA depletion, and DNA microarray for expression profiling.多聚(A)捕获、核糖体 RNA 耗尽和 DNA 微阵列在表达谱分析方面的比较。
BMC Genomics. 2014 Jun 2;15(1):419. doi: 10.1186/1471-2164-15-419.

引用本文的文献

1
Consistently processed RNA sequencing data from 50 sources enriched for pediatric data.对来自50个富含儿科数据来源的RNA测序数据进行了一致处理。
Sci Data. 2025 Jul 2;12(1):1134. doi: 10.1038/s41597-025-05376-z.
2
Comparative analysis of RNA expression identifies effective targeted drug in myoepithelial carcinoma.RNA表达的比较分析确定了肌上皮癌中的有效靶向药物。
NPJ Precis Oncol. 2025 May 17;9(1):145. doi: 10.1038/s41698-025-00918-5.
3
A novel splice site variant in leads to aberrant splicing and loss of DEGS1 enzyme activity, a VUS resolved.

本文引用的文献

1
Comparative Tumor RNA Sequencing Analysis for Difficult-to-Treat Pediatric and Young Adult Patients With Cancer.比较肿瘤 RNA 测序分析在治疗困难的儿科和青年成人癌症患者中的应用。
JAMA Netw Open. 2019 Oct 2;2(10):e1913968. doi: 10.1001/jamanetworkopen.2019.13968.
2
Barriers to accessing public cancer genomic data.获取公共癌症基因组数据的障碍。
Sci Data. 2019 Jun 20;6(1):98. doi: 10.1038/s41597-019-0096-4.
3
Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers.利用独特分子标识符消除 RNA-seq 和 small RNA-seq 中的 PCR 重复。
中的一种新型剪接位点变异导致异常剪接和DEGS1酶活性丧失,一个意义未明的变异得到了解决。
medRxiv. 2025 Apr 11:2025.04.04.25325118. doi: 10.1101/2025.04.04.25325118.
4
Comparative analysis of RNA expression in a single institution cohort of pediatric cancer patients.单一机构的儿科癌症患者队列中RNA表达的比较分析。
NPJ Precis Oncol. 2025 Mar 22;9(1):81. doi: 10.1038/s41698-025-00852-6.
5
Global regulatory factor VeA upregulates the production of antitumor substances in endophytic Fusarium solani.全球调控因子 VeA 上调内生尖孢镰刀菌中抗肿瘤物质的产生。
Antonie Van Leeuwenhoek. 2022 Aug;115(8):1085-1100. doi: 10.1007/s10482-022-01753-5. Epub 2022 Jul 5.
BMC Genomics. 2018 Jul 13;19(1):531. doi: 10.1186/s12864-018-4933-1.
4
Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer.起源细胞模式主导了 33 种癌症类型的 10000 个肿瘤的分子分类。
Cell. 2018 Apr 5;173(2):291-304.e6. doi: 10.1016/j.cell.2018.03.022.
5
Genetic effects on gene expression across human tissues.基因对人体各组织基因表达的影响。
Nature. 2017 Oct 11;550(7675):204-213. doi: 10.1038/nature24277.
6
Toil enables reproducible, open source, big biomedical data analyses.Toil支持可重复的、开源的大型生物医学数据分析。
Nat Biotechnol. 2017 Apr 11;35(4):314-316. doi: 10.1038/nbt.3772.
7
Effect of method of deduplication on estimation of differential gene expression using RNA-seq.重复数据去除方法对使用RNA测序估计差异基因表达的影响。
PeerJ. 2017 Mar 16;5:e3091. doi: 10.7717/peerj.3091. eCollection 2017.
8
The impact of amplification on differential expression analyses by RNA-seq.扩增对RNA测序差异表达分析的影响。
Sci Rep. 2016 May 9;6:25533. doi: 10.1038/srep25533.
9
Near-optimal probabilistic RNA-seq quantification.近乎最优的概率 RNA-seq 定量。
Nat Biotechnol. 2016 May;34(5):525-7. doi: 10.1038/nbt.3519. Epub 2016 Apr 4.
10
SAMBLASTER: fast duplicate marking and structural variant read extraction.SAMBLASTER:快速重复标记和结构变异读段提取。
Bioinformatics. 2014 Sep 1;30(17):2503-5. doi: 10.1093/bioinformatics/btu314. Epub 2014 May 7.