RNA测序实验中同源基因表达水平的准确估计。

Accurate estimation of expression levels of homologous genes in RNA-seq experiments.

作者信息

Paşaniuc Bogdan, Zaitlen Noah, Halperin Eran

机构信息

Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA.

出版信息

J Comput Biol. 2011 Mar;18(3):459-68. doi: 10.1089/cmb.2010.0259.

DOI:10.1089/cmb.2010.0259

PMID:21385047

Abstract

Abstract Next generation high-throughput sequencing (NGS) is poised to replace array-based technologies as the experiment of choice for measuring RNA expression levels. Several groups have demonstrated the power of this new approach (RNA-seq), making significant and novel contributions and simultaneously proposing methodologies for the analysis of RNA-seq data. In a typical experiment, millions of short sequences (reads) are sampled from RNA extracts and mapped back to a reference genome. The number of reads mapping to each gene is used as proxy for its corresponding RNA concentration. A significant challenge in analyzing RNA expression of homologous genes is the large fraction of the reads that map to multiple locations in the reference genome. Currently, these reads are either dropped from the analysis, or a naive algorithm is used to estimate their underlying distribution. In this work, we present a rigorous alternative for handling the reads generated in an RNA-seq experiment within a probabilistic model for RNA-seq data; we develop maximum likelihood-based methods for estimating the model parameters. In contrast to previous methods, our model takes into account the fact that the DNA of the sequenced individual is not a perfect copy of the reference sequence. We show with both simulated and real RNA-seq data that our new method improves the accuracy and power of RNA-seq experiments.

摘要

摘要下一代高通量测序（NGS）有望取代基于芯片的技术，成为测量RNA表达水平的首选实验方法。多个研究小组已经证明了这种新方法（RNA测序）的强大功能，做出了重要且新颖的贡献，同时还提出了分析RNA测序数据的方法。在一个典型的实验中，从RNA提取物中采样数百万个短序列（读数），并将其映射回参考基因组。映射到每个基因的读数数量被用作其相应RNA浓度的代理。分析同源基因RNA表达的一个重大挑战是，很大一部分读数映射到参考基因组的多个位置。目前，这些读数要么从分析中剔除，要么使用简单的算法来估计其潜在分布。在这项工作中，我们提出了一种严格的替代方法，用于在RNA测序数据的概率模型中处理RNA测序实验中产生的读数；我们开发了基于最大似然的方法来估计模型参数。与以前的方法不同，我们的模型考虑到测序个体的DNA并非参考序列的完美拷贝这一事实。我们通过模拟和真实的RNA测序数据表明，我们的新方法提高了RNA测序实验的准确性和效能。

相似文献

Accurate estimation of expression levels of homologous genes in RNA-seq experiments.RNA测序实验中同源基因表达水平的准确估计。

J Comput Biol. 2011 Mar;18(3):459-68. doi: 10.1089/cmb.2010.0259.

Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.通过纳入非外显子映射读数对RNA测序数据进行差异表达分析。

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2164-16-S7-S14. Epub 2015 Jun 11.

In Silico HLA Typing Using Standard RNA-Seq Sequence Reads.使用标准RNA测序序列读数进行计算机HLA分型

Methods Mol Biol. 2015;1310:247-58. doi: 10.1007/978-1-4939-2690-9_20.

Trimming of sequence reads alters RNA-Seq gene expression estimates.序列 reads 的修剪会改变 RNA-Seq 基因表达估计值。

BMC Bioinformatics. 2016 Feb 25;17:103. doi: 10.1186/s12859-016-0956-2.

Detection of high variability in gene expression from single-cell RNA-seq profiling.从单细胞RNA测序分析中检测基因表达的高变异性。

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):508. doi: 10.1186/s12864-016-2897-6.

DAFS: a data-adaptive flag method for RNA-sequencing data to differentiate genes with low and high expression.DAFS：一种用于 RNA-seq 数据的自适应标记方法，用于区分低表达和高表达基因。

BMC Bioinformatics. 2014 Mar 31;15:92. doi: 10.1186/1471-2105-15-92.

RNA-Seq for transcriptome analysis in non-model plants.用于非模式植物转录组分析的RNA测序

Methods Mol Biol. 2013;1069:43-58. doi: 10.1007/978-1-62703-613-9_4.

Towards next generation CHO cell biology: Bioinformatics methods for RNA-Seq-based expression profiling.迈向新一代中国仓鼠卵巢细胞生物学：基于RNA测序的表达谱分析的生物信息学方法

Biotechnol J. 2015 Jul;10(7):950-66. doi: 10.1002/biot.201500107. Epub 2015 Jun 9.

MGMR: leveraging RNA-Seq population data to optimize expression estimation.MGMR：利用 RNA-Seq 群体数据优化表达估计。

BMC Bioinformatics. 2012 Apr 19;13 Suppl 6(Suppl 6):S2. doi: 10.1186/1471-2105-13-S6-S2.

QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization.QuickRNASeq将大规模RNA测序数据分析提升到了一个新的自动化和交互式可视化水平。

BMC Genomics. 2016 Jan 8;17:39. doi: 10.1186/s12864-015-2356-9.

引用本文的文献

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs.一种用于量化转录丰度估计和注释目录可靠性的连接覆盖兼容性得分。

Life Sci Alliance. 2019 Jan 17;2(1). doi: 10.26508/lsa.201800175. Print 2019 Feb.

miR-MaGiC improves quantification accuracy for small RNA-seq.miR-MaGiC提高了小RNA测序的定量准确性。

BMC Res Notes. 2018 May 15;11(1):296. doi: 10.1186/s13104-018-3418-2.

Evaluation of Bioinformatics Approaches for Next-Generation Sequencing Analysis of microRNAs with a Toxicogenomics Study Design.采用毒理基因组学研究设计对用于微小RNA下一代测序分析的生物信息学方法进行评估。

Front Genet. 2018 Feb 6;9:22. doi: 10.3389/fgene.2018.00022. eCollection 2018.

Upregulated WEE1 protects endothelial cells of colorectal cancer liver metastases.上调的WEE1可保护结直肠癌肝转移的内皮细胞。

Oncotarget. 2017 Jun 27;8(26):42288-42299. doi: 10.18632/oncotarget.15039.

Efficient Approach to Correct Read Alignment for Pseudogene Abundance Estimates.用于假基因丰度估计的正确读段比对的有效方法。

IEEE/ACM Trans Comput Biol Bioinform. 2017 May-Jun;14(3):522-533. doi: 10.1109/TCBB.2016.2591533. Epub 2016 Jul 14.

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping.Perm-seq：通过先验增强读段映射在基因组的节段重复和高度重复区域中绘制蛋白质-DNA相互作用图谱

PLoS Comput Biol. 2015 Oct 20;11(10):e1004491. doi: 10.1371/journal.pcbi.1004491. eCollection 2015 Oct.

Optimization of next-generation sequencing transcriptome annotation for species lacking sequenced genomes.针对缺乏测序基因组的物种优化新一代测序转录组注释

Mol Ecol Resour. 2016 Mar;16(2):446-58. doi: 10.1111/1755-0998.12465. Epub 2015 Oct 14.

EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.EMSAR：通过基于可映射性的分割和重新聚类从RNA测序数据估计转录本丰度

BMC Bioinformatics. 2015 Sep 3;16:278. doi: 10.1186/s12859-015-0704-z.

An integrated inspection of the somatic mutations in a lung squamous cell carcinoma using next-generation sequencing.使用下一代测序技术对肺鳞状细胞癌中的体细胞突变进行综合检测。

PLoS One. 2013 Nov 11;8(11):e78823. doi: 10.1371/journal.pone.0078823. eCollection 2013.

The transcriptional consequences of somatic amplifications, deletions, and rearrangements in a human lung squamous cell carcinoma.人类肺鳞癌中体染色体扩增、缺失和重排的转录后果。

Neoplasia. 2012 Nov;14(11):1075-86. doi: 10.1593/neo.121380.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

RNA测序实验中同源基因表达水平的准确估计。

Accurate estimation of expression levels of homologous genes in RNA-seq experiments.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献