1 Division of Pulmonary and Critical Care, Department of Medicine.
2 Division of Thoracic Surgery, Department of Surgery.
Am J Respir Cell Mol Biol. 2018 Aug;59(2):145-157. doi: 10.1165/rcmb.2017-0430TR.
Since the first publications coining the term RNA-seq (RNA sequencing) appeared in 2008, the number of publications containing RNA-seq data has grown exponentially, hitting an all-time high of 2,808 publications in 2016 (PubMed). With this wealth of RNA-seq data being generated, it is a challenge to extract maximal meaning from these datasets, and without the appropriate skills and background, there is risk of misinterpretation of these data. However, a general understanding of the principles underlying each step of RNA-seq data analysis allows investigators without a background in programming and bioinformatics to critically analyze their own datasets as well as published data. Our goals in the present review are to break down the steps of a typical RNA-seq analysis and to highlight the pitfalls and checkpoints along the way that are vital for bench scientists and biomedical researchers performing experiments that use RNA-seq.
自 2008 年首次发表将术语“RNA-seq(RNA 测序)”的文章以来,包含 RNA-seq 数据的出版物数量呈指数级增长,2016 年达到了 2808 篇的历史最高纪录(PubMed)。随着这些 RNA-seq 数据的大量产生,从这些数据集中提取最大的意义是一项挑战,如果没有适当的技能和背景,就有可能对这些数据产生误解。然而,对 RNA-seq 数据分析的每个步骤所依据的原则有一个总体的了解,使得没有编程和生物信息学背景的研究人员能够批判性地分析自己的数据集以及已发表的数据。我们在本次综述中的目标是分解典型的 RNA-seq 分析步骤,并强调在这个过程中至关重要的陷阱和检查点,这些对于从事使用 RNA-seq 的实验的实验科学家和生物医学研究人员来说是必不可少的。