Department of Statistics and Public Health Sciences, Penn State University, 514A Wartik Building, University Park, PA 16802, USA.
Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Center for Comprehensive Informatics, Emory University, 1518 Clifton Rd., N.E., Atlanta, GA 30322, USA.
Genes (Basel). 2010 Sep 27;1(2):317-34. doi: 10.3390/genes1020317.
The recent arrival of ultra-high throughput, next generation sequencing (NGS) technologies has revolutionized the genetics and genomics fields by allowing rapid and inexpensive sequencing of billions of bases. The rapid deployment of NGS in a variety of sequencing-based experiments has resulted in fast accumulation of massive amounts of sequencing data. To process this new type of data, a torrent of increasingly sophisticated algorithms and software tools are emerging to help the analysis stage of the NGS applications. In this article, we strive to comprehensively identify the critical challenges that arise from all stages of NGS data analysis and provide an objective overview of what has been achieved in existing works. At the same time, we highlight selected areas that need much further research to improve our current capabilities to delineate the most information possible from NGS data. The article focuses on applications dealing with ChIP-Seq and RNA-Seq.
最近出现的超高通量、新一代测序(NGS)技术通过快速、廉价地测序数十亿个碱基,彻底改变了遗传学和基因组学领域。NGS 在各种基于测序的实验中的快速部署导致了海量测序数据的快速积累。为了处理这种新型数据,涌现出大量越来越复杂的算法和软件工具,以帮助 NGS 应用的分析阶段。在本文中,我们努力全面识别 NGS 数据分析各个阶段出现的关键挑战,并客观概述现有工作中已取得的成果。同时,我们重点介绍了一些需要进一步研究的选定领域,以提高我们从 NGS 数据中尽可能多地提取信息的现有能力。本文侧重于处理 ChIP-Seq 和 RNA-Seq 的应用。