Suppr超能文献

基于层次分析的 RNA-seq 测序reads 提高了等位基因特异性表达的准确性。

Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression.

机构信息

The Jackson Laboratory, Bar Harbor, USA.

Department of Genetics, The University of North Carolina, Chapel Hill, USA.

出版信息

Bioinformatics. 2018 Jul 1;34(13):2177-2184. doi: 10.1093/bioinformatics/bty078.

Abstract

MOTIVATION

Allele-specific expression (ASE) refers to the differential abundance of the allelic copies of a transcript. RNA sequencing (RNA-seq) can provide quantitative estimates of ASE for genes with transcribed polymorphisms. When short-read sequences are aligned to a diploid transcriptome, read-mapping ambiguities confound our ability to directly count reads. Multi-mapping reads aligning equally well to multiple genomic locations, isoforms or alleles can comprise the majority (>85%) of reads. Discarding them can result in biases and substantial loss of information. Methods have been developed that use weighted allocation of read counts but these methods treat the different types of multi-reads equivalently. We propose a hierarchical approach to allocation of read counts that first resolves ambiguities among genes, then among isoforms, and lastly between alleles. We have implemented our model in EMASE software (Expectation-Maximization for Allele Specific Expression) to estimate total gene expression, isoform usage and ASE based on this hierarchical allocation.

RESULTS

Methods that align RNA-seq reads to a diploid transcriptome incorporating known genetic variants improve estimates of ASE and total gene expression compared to methods that use reference genome alignments. Weighted allocation methods outperform methods that discard multi-reads. Hierarchical allocation of reads improves estimation of ASE even when data are simulated from a non-hierarchical model. Analysis of RNA-seq data from F1 hybrid mice using EMASE reveals widespread ASE associated with cis-acting polymorphisms and a small number of parent-of-origin effects.

AVAILABILITY AND IMPLEMENTATION

EMASE software is available at https://github.com/churchill-lab/emase.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

等位基因特异性表达(ASE)是指转录本的等位基因拷贝的差异丰度。RNA 测序(RNA-seq)可以为具有转录多态性的基因提供 ASE 的定量估计。当短读序列与二倍体转录组对齐时,读映射的歧义会混淆我们直接计数读的能力。可以同等地与多个基因组位置、异构体或等位基因对齐的多映射读可以构成大部分(>85%)读。丢弃它们可能会导致偏差和大量信息丢失。已经开发了一些使用读计数加权分配的方法,但这些方法对不同类型的多读等效处理。我们提出了一种层次化的读计数分配方法,首先解决基因之间的歧义,然后解决异构体之间的歧义,最后解决等位基因之间的歧义。我们已经在 EMASE 软件(用于等位基因特异性表达的期望最大化)中实现了我们的模型,以基于这种层次化分配来估计总基因表达、异构体使用和 ASE。

结果

与使用参考基因组比对的方法相比,将 RNA-seq 读与包含已知遗传变异的二倍体转录组对齐的方法可以改善 ASE 和总基因表达的估计。加权分配方法优于丢弃多读的方法。即使数据是根据非层次模型模拟的,读的层次化分配也可以改进 ASE 的估计。使用 EMASE 分析 F1 杂交小鼠的 RNA-seq 数据揭示了与顺式作用多态性相关的广泛 ASE 以及少数亲本来源效应。

可用性和实现

EMASE 软件可在 https://github.com/churchill-lab/emase 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

5
QuASAR: quantitative allele-specific analysis of reads.QuASAR:读取的定量等位基因特异性分析。
Bioinformatics. 2015 Apr 15;31(8):1235-42. doi: 10.1093/bioinformatics/btu802. Epub 2014 Dec 4.

引用本文的文献

本文引用的文献

3
Near-optimal probabilistic RNA-seq quantification.近乎最优的概率 RNA-seq 定量。
Nat Biotechnol. 2016 May;34(5):525-7. doi: 10.1038/nbt.3519. Epub 2016 Apr 4.
7
8

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验