Li Wei Vivian, Li Jingyi Jessica
Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095-1554, USA.
Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095-088, USA.
Quant Biol. 2018 Sep;6(3):195-209. doi: 10.1007/s40484-018-0144-7. Epub 2018 Aug 10.
Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date.
We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations.
The development of statistical and computational methods for analyzing RNA-seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statistical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development.
自从新一代RNA测序(RNA-seq)技术发明以来,它们已成为研究生物样本中RNA分子的存在情况和数量的强大工具,并彻底改变了转录组学研究。在四个不同层面(样本、基因、转录本和外显子)对RNA-seq数据进行分析涉及多个统计和计算问题,其中一些问题至今仍具有挑战性。
我们从统计学角度审视了样本、基因、转录本和外显子层面的RNA-seq分析工具。我们还强调了最具实际考量的生物学和统计学问题。
在过去十年中,用于分析RNA-seq数据的统计和计算方法取得了重大进展。然而,为回答相同生物学问题而开发的方法通常依赖于不同的统计模型,并且在不同场景下表现出不同的性能。本综述讨论并比较了多种常用统计模型的假设,希望能帮助用户根据需要选择合适的方法,并协助开发者进行未来的方法开发。