Salzman Julia, Jiang Hui, Wong Wing Hung
Research Associate in the Department of Statistics and Biochemistry, Stanford University, Stanford, California 94305, USA.
Stat Sci. 2011 Feb;26(1). doi: 10.1214/10-STS343.
Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.
最近,RNA的超高通量测序(RNA-Seq)已发展成为一种基因表达分析方法。通过获取数千万甚至数亿条转录序列的读数,RNA-Seq实验可以全面检测任何感兴趣样本中的基因(转录本)群体。本文介绍了一种用于从RNA-Seq数据估计异构体丰度的统计模型,该模型足够灵活,能够适应单端和双端RNA-Seq数据以及转录本长度上的抽样偏差。基于该模型最小充分统计量的推导,给出了该模型最大似然估计器的一种计算上可行的实现。此外,研究表明,在固定测序深度下,使用双端RNA-Seq比单端测序能提供更准确的异构体丰度估计。还给出了模拟研究。