Suppr
超能文献

存在模糊短测序读数时基因组靶点丰度估计的概念框架

A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads.

作者信息

Górczak Katarzyna, Claesen Jürgen, Burzykowski Tomasz

机构信息

Interuniversity Institute for Biostatistics and statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium.

Department of Mathematical and Statistical Methods, Poznań University of Life Sciences, Poznań, Poland.

出版信息

J Comput Biol. 2020 Aug;27(8):1232-1247. doi: 10.1089/cmb.2019.0272. Epub 2019 Dec 31.

DOI:10.1089/cmb.2019.0272

PMID:31895597

Abstract

RNA sequencing (RNA-seq) is widely used to study gene-, transcript-, or exon expression. To quantify the expression level, millions of short sequenced reads need to be mapped back to a reference genome or transcriptome. Read mapping makes it possible to find a location to which a read is identical or similar. Based upon this alignment, expression summaries, that is, read counts are generated. However, reads may be matched to multiple locations. Such ambiguously mapped reads are often ignored in the analysis, which is a potential loss of information and may cause bias in expression estimation. We present the general principles underlying multiread allocation and unbiased estimation of the expression level of genes, exons, or transcripts in the presence of multiple mapped reads. The underlying principles are derived from a theoretical concept that identifies important sources of information such as the number of uniquely mapped reads, the total target length, and the length of the shared target regions. We show with simulation studies that methods incorporating some or all of the aforementioned sources of information estimate the expression levels of genes, exons, and/or transcripts with a higher precision and accuracy than methods that do not use this information. We identify important sources of information that should be taken into account by methods that estimate the abundance of genes, exons, and/or transcripts to achieve good precision and accuracy.

摘要

RNA测序（RNA-seq）被广泛用于研究基因、转录本或外显子的表达。为了量化表达水平，数百万条短测序读段需要被映射回参考基因组或转录组。读段映射使得找到读段与之相同或相似的位置成为可能。基于这种比对，生成表达汇总，即读段计数。然而，读段可能会匹配到多个位置。这种映射不明确的读段在分析中常常被忽略，这是一种潜在的信息损失，并且可能导致表达估计出现偏差。我们阐述了在存在多个映射读段的情况下，多读段分配以及对基因、外显子或转录本表达水平进行无偏估计的一般原则。这些基本原则源自一个理论概念，该概念确定了重要的信息来源，如唯一映射读段的数量、总目标长度以及共享目标区域的长度。我们通过模拟研究表明，纳入部分或所有上述信息来源的方法比不使用这些信息的方法能更精确、准确地估计基因、外显子和/或转录本的表达水平。我们确定了估计基因、外显子和/或转录本丰度的方法为实现良好的精度和准确性应考虑的重要信息来源。

相似文献

A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads.

J Comput Biol. 2020 Aug;27(8):1232-1247. doi: 10.1089/cmb.2019.0272. Epub 2019 Dec 31.

Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.

BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.

EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.

BMC Bioinformatics. 2015 Sep 3;16:278. doi: 10.1186/s12859-015-0704-z.

A fuzzy method for RNA-Seq differential expression analysis in presence of multireads.

BMC Bioinformatics. 2016 Nov 8;17(Suppl 12):345. doi: 10.1186/s12859-016-1195-2.

Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression.

Bioinformatics. 2018 Jul 1;34(13):2177-2184. doi: 10.1093/bioinformatics/bty078.

BM-map: Bayesian mapping of multireads for next-generation sequencing data.

Biometrics. 2011 Dec;67(4):1215-24. doi: 10.1111/j.1541-0420.2011.01605.x. Epub 2011 Apr 22.

A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification.

BMC Genomics. 2015 Feb 18;16(1):97. doi: 10.1186/s12864-015-1308-8.

Ornaments for efficient allele-specific expression estimation with bias correction.

Am J Hum Genet. 2024 Aug 8;111(8):1770-1781. doi: 10.1016/j.ajhg.2024.06.014. Epub 2024 Jul 23.

Knowledge-based reconstruction of mRNA transcripts with short sequencing reads for transcriptome research.

PLoS One. 2012;7(2):e31440. doi: 10.1371/journal.pone.0031440. Epub 2012 Feb 1.

Direct full-length RNA sequencing reveals unexpected transcriptome complexity during development.

Genome Res. 2020 Feb;30(2):287-298. doi: 10.1101/gr.251512.119. Epub 2020 Feb 5.

引用本文的文献

Bulked Segregant RNA Sequencing Revealed Difference Between Virulent and Avirulent Brown Planthoppers.

Front Plant Sci. 2022 Apr 14;13:843227. doi: 10.3389/fpls.2022.843227. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

存在模糊短测序读数时基因组靶点丰度估计的概念框架

A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译