• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

重复数据去除方法对使用RNA测序估计差异基因表达的影响。

Effect of method of deduplication on estimation of differential gene expression using RNA-seq.

作者信息

Klepikova Anna V, Kasianov Artem S, Chesnokov Mikhail S, Lazarevich Natalia L, Penin Aleksey A, Logacheva Maria

机构信息

Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia.

A. N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia.

出版信息

PeerJ. 2017 Mar 16;5:e3091. doi: 10.7717/peerj.3091. eCollection 2017.

DOI:10.7717/peerj.3091
PMID:28321364
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5357343/
Abstract

BACKGROUND

RNA-seq is a useful tool for analysis of gene expression. However, its robustness is greatly affected by a number of artifacts. One of them is the presence of duplicated reads.

RESULTS

To infer the influence of different methods of removal of duplicated reads on estimation of gene expression in cancer genomics, we analyzed paired samples of hepatocellular carcinoma (HCC) and non-tumor liver tissue. Four protocols of data analysis were applied to each sample: processing without deduplication, deduplication using a method implemented in SAMtools, and deduplication based on one or two molecular indices (MI). We also analyzed the influence of sequencing layout (single read or paired end) and read length. We found that deduplication without MI greatly affects estimated expression values; this effect is the most pronounced for highly expressed genes.

CONCLUSION

The use of unique molecular identifiers greatly improves accuracy of RNA-seq analysis, especially for highly expressed genes. We developed a set of scripts that enable handling of MI and their incorporation into RNA-seq analysis pipelines. Deduplication without MI affects results of differential gene expression analysis, producing a high proportion of false negative results. The absence of duplicate read removal is biased towards false positives. In those cases where using MI is not possible, we recommend using paired-end sequencing layout.

摘要

背景

RNA测序是分析基因表达的一种有用工具。然而,其稳健性受到许多人为因素的极大影响。其中之一是重复 reads 的存在。

结果

为了推断去除重复 reads 的不同方法对癌症基因组学中基因表达估计的影响,我们分析了肝细胞癌(HCC)和非肿瘤肝组织的配对样本。对每个样本应用了四种数据分析方案:不进行重复数据删除处理、使用SAMtools中实现的方法进行重复数据删除,以及基于一个或两个分子索引(MI)进行重复数据删除。我们还分析了测序布局(单端 reads 或双端 reads)和读长的影响。我们发现不使用MI的重复数据删除会极大地影响估计的表达值;这种影响在高表达基因中最为明显。

结论

使用独特分子标识符可大大提高RNA测序分析的准确性,尤其是对于高表达基因。我们开发了一组脚本,能够处理MI并将其纳入RNA测序分析流程。不使用MI的重复数据删除会影响差异基因表达分析的结果,产生高比例的假阴性结果。不进行重复 reads 删除会偏向于产生假阳性结果。在无法使用MI的情况下,我们建议使用双端测序布局。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26d2/5357343/c9a2b7cd4cfc/peerj-05-3091-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26d2/5357343/071924dd2816/peerj-05-3091-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26d2/5357343/37104429f45d/peerj-05-3091-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26d2/5357343/10a61bb405a9/peerj-05-3091-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26d2/5357343/c9a2b7cd4cfc/peerj-05-3091-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26d2/5357343/071924dd2816/peerj-05-3091-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26d2/5357343/37104429f45d/peerj-05-3091-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26d2/5357343/10a61bb405a9/peerj-05-3091-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26d2/5357343/c9a2b7cd4cfc/peerj-05-3091-g004.jpg

相似文献

1
Effect of method of deduplication on estimation of differential gene expression using RNA-seq.重复数据去除方法对使用RNA测序估计差异基因表达的影响。
PeerJ. 2017 Mar 16;5:e3091. doi: 10.7717/peerj.3091. eCollection 2017.
2
Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols.来自RNA测序的差异表达基因和功能富集结果受单端读段与双端读段以及链特异性与非链特异性方案选择的影响。
BMC Genomics. 2017 May 23;18(1):399. doi: 10.1186/s12864-017-3797-0.
3
A fuzzy method for RNA-Seq differential expression analysis in presence of multireads.一种用于存在多重读取情况下RNA测序差异表达分析的模糊方法。
BMC Bioinformatics. 2016 Nov 8;17(Suppl 12):345. doi: 10.1186/s12859-016-1195-2.
4
dupRadar: a Bioconductor package for the assessment of PCR artifacts in RNA-Seq data.dupRadar:一个用于评估RNA测序数据中PCR假象的Bioconductor软件包。
BMC Bioinformatics. 2016 Oct 21;17(1):428. doi: 10.1186/s12859-016-1276-2.
5
Trimming of sequence reads alters RNA-Seq gene expression estimates.序列 reads 的修剪会改变 RNA-Seq 基因表达估计值。
BMC Bioinformatics. 2016 Feb 25;17:103. doi: 10.1186/s12859-016-0956-2.
6
Gene dispersion is the key determinant of the read count bias in differential expression analysis of RNA-seq data.基因离散度是RNA-seq数据差异表达分析中读取计数偏差的关键决定因素。
BMC Genomics. 2017 May 25;18(1):408. doi: 10.1186/s12864-017-3809-0.
7
Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.通过纳入非外显子映射读数对RNA测序数据进行差异表达分析。
BMC Genomics. 2015;16 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2164-16-S7-S14. Epub 2015 Jun 11.
8
Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data.Gencore:一种高效的工具,用于生成共识读数,以抑制 NGS 数据的错误并去除重复。
BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):606. doi: 10.1186/s12859-019-3280-9.
9
Alevin efficiently estimates accurate gene abundances from dscRNA-seq data.Alevin 能够有效地从 dscRNA-seq 数据中估计准确的基因丰度。
Genome Biol. 2019 Mar 27;20(1):65. doi: 10.1186/s13059-019-1670-y.
10
ChimPipe: accurate detection of fusion genes and transcription-induced chimeras from RNA-seq data.ChimPipe:从RNA测序数据中准确检测融合基因和转录诱导嵌合体。
BMC Genomics. 2017 Jan 3;18(1):7. doi: 10.1186/s12864-016-3404-9.

引用本文的文献

1
Grape Berry Responses to Sequential Flooding and Heatwave Events: A Physiological, Transcriptional, and Metabolic Overview.葡萄浆果对连续淹水和热浪事件的响应:生理、转录和代谢概述。
Plants (Basel). 2022 Dec 17;11(24):3574. doi: 10.3390/plants11243574.
2
Current challenges and best practices for cell-free long RNA biomarker discovery.无细胞长链RNA生物标志物发现的当前挑战与最佳实践
Biomark Res. 2022 Aug 18;10(1):62. doi: 10.1186/s40364-022-00409-w.
3
Sequences to Differences in Gene Expression: Analysis of RNA-Seq Data.基因表达差异的序列分析:RNA-Seq 数据分析。

本文引用的文献

1
Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers.Je是一个多功能套件,用于处理带有独特分子标识符的多重NGS文库。
BMC Bioinformatics. 2016 Oct 8;17(1):419. doi: 10.1186/s12859-016-1284-2.
2
p62/SQSTM1-Dr. Jekyll and Mr. Hyde that prevents oxidative stress but promotes liver cancer.p62/SQSTM1——兼具抗氧化应激作用与促肝癌作用的“杰ekyll博士与海德先生”。 (注:这里“杰ekyll博士与海德先生”是用文学形象比喻p62/SQSTM1具有两种不同甚至相反特性的复杂情况 )
FEBS Lett. 2016 Aug;590(15):2375-97. doi: 10.1002/1873-3468.12301. Epub 2016 Aug 6.
3
p62/Sqstm1 promotes malignancy of HCV-positive hepatocellular carcinoma through Nrf2-dependent metabolic reprogramming.
Methods Mol Biol. 2022;2508:279-318. doi: 10.1007/978-1-0716-2376-3_20.
4
Replicate sequencing libraries are important for quantification of allelic imbalance.复制测序文库对于等位基因失衡的定量分析很重要。
Nat Commun. 2021 Jun 7;12(1):3370. doi: 10.1038/s41467-021-23544-8.
5
The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets.报告 RNA 测序深度时使用映射外显子非重复读的理由:来自儿科癌症数据集的实例。
Gigascience. 2021 Mar 13;10(3). doi: 10.1093/gigascience/giab011.
6
Transcriptome profiling at osmotic and ionic phases of salt stress response in bread wheat uncovers trait-specific candidate genes.在面包小麦盐胁迫响应的渗透和离子阶段进行转录组谱分析,揭示了具有特定性状的候选基因。
BMC Plant Biol. 2020 Sep 16;20(1):428. doi: 10.1186/s12870-020-02616-9.
7
A novel virtual barcode strategy for accurate panel-wide variant calling in circulating tumor DNA.一种新型虚拟条码策略,用于在循环肿瘤 DNA 中进行准确的面板-wide 变异调用。
BMC Bioinformatics. 2020 Apr 3;21(1):127. doi: 10.1186/s12859-020-3412-2.
8
Flooding Responses on Grapevine: A Physiological, Transcriptional, and Metabolic Perspective.葡萄藤的水淹响应:生理学、转录组学和代谢组学视角
Front Plant Sci. 2019 Mar 26;10:339. doi: 10.3389/fpls.2019.00339. eCollection 2019.
9
Identification of factors associated with duplicate rate in ChIP-seq data.鉴定与 ChIP-seq 数据中重复率相关的因素。
PLoS One. 2019 Apr 3;14(4):e0214723. doi: 10.1371/journal.pone.0214723. eCollection 2019.
10
RNA-sequencing in ophthalmology research: considerations for experimental design and analysis.眼科研究中的RNA测序:实验设计与分析的考量
Ther Adv Ophthalmol. 2019 Mar 15;11:2515841419835460. doi: 10.1177/2515841419835460. eCollection 2019 Jan-Dec.
p62/Sqstm1 通过 Nrf2 依赖性代谢重编程促进 HCV 阳性肝细胞癌的恶性转化。
Nat Commun. 2016 Jun 27;7:12030. doi: 10.1038/ncomms12030.
4
Translating RNA sequencing into clinical diagnostics: opportunities and challenges.将RNA测序转化为临床诊断:机遇与挑战。
Nat Rev Genet. 2016 May;17(5):257-71. doi: 10.1038/nrg.2016.10. Epub 2016 Mar 21.
5
KIAA0101 mRNA expression in the peripheral blood of hepatocellular carcinoma patients: Association with some clinicopathological features.肝细胞癌患者外周血中KIAA0101 mRNA的表达:与某些临床病理特征的关联
Clin Biochem. 2016 Jul;49(10-11):787-91. doi: 10.1016/j.clinbiochem.2015.12.016. Epub 2016 Mar 9.
6
EEF1D modulates proliferation and epithelial-mesenchymal transition in oral squamous cell carcinoma.EEF1D 调节口腔鳞状细胞癌的增殖和上皮-间充质转化。
Clin Sci (Lond). 2016 May 1;130(10):785-99. doi: 10.1042/CS20150646. Epub 2016 Jan 28.
7
Translating cancer genomes and transcriptomes for precision oncology.为精准肿瘤学翻译癌症基因组和转录组。
CA Cancer J Clin. 2016 Jan-Feb;66(1):75-88. doi: 10.3322/caac.21329. Epub 2015 Nov 3.
8
Detrimental effects of duplicate reads and low complexity regions on RNA- and ChIP-seq data.重复读数和低复杂度区域对RNA测序和染色质免疫沉淀测序数据的有害影响。
BMC Bioinformatics. 2015;16 Suppl 13(Suppl 13):S10. doi: 10.1186/1471-2105-16-S13-S10. Epub 2015 Sep 25.
9
Regulation of microtubule dynamics by DIAPH3 influences amoeboid tumor cell mechanics and sensitivity to taxanes.DIAPH3对微管动力学的调节影响阿米巴样肿瘤细胞力学及对紫杉烷的敏感性。
Sci Rep. 2015 Jul 16;5:12136. doi: 10.1038/srep12136.
10
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.使用DESeq2对RNA测序数据的倍数变化和离散度进行适度估计。
Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.