Division of Basic Sciences, University of Crete Medical School, Heraklion 71110, Greece.
BioData Min. 2013 Jul 25;6(1):13. doi: 10.1186/1756-0381-6-13.
Elucidating the content of a DNA sequence is critical to deeper understand and decode the genetic information for any biological system. As next generation sequencing (NGS) techniques have become cheaper and more advanced in throughput over time, great innovations and breakthrough conclusions have been generated in various biological areas. Few of these areas, which get shaped by the new technological advances, involve evolution of species, microbial mapping, population genetics, genome-wide association studies (GWAs), comparative genomics, variant analysis, gene expression, gene regulation, epigenetics and personalized medicine. While NGS techniques stand as key players in modern biological research, the analysis and the interpretation of the vast amount of data that gets produced is a not an easy or a trivial task and still remains a great challenge in the field of bioinformatics. Therefore, efficient tools to cope with information overload, tackle the high complexity and provide meaningful visualizations to make the knowledge extraction easier are essential. In this article, we briefly refer to the sequencing methodologies and the available equipment to serve these analyses and we describe the data formats of the files which get produced by them. We conclude with a thorough review of tools developed to efficiently store, analyze and visualize such data with emphasis in structural variation analysis and comparative genomics. We finally comment on their functionality, strengths and weaknesses and we discuss how future applications could further develop in this field.
阐明 DNA 序列的内容对于深入理解和解码任何生物系统的遗传信息都至关重要。随着下一代测序 (NGS) 技术在时间上变得更便宜、更先进,在各个生物领域都产生了重大的创新和突破结论。这些领域中的少数领域受到新技术进步的影响,包括物种进化、微生物图谱、群体遗传学、全基因组关联研究 (GWAs)、比较基因组学、变体分析、基因表达、基因调控、表观遗传学和个性化医学。虽然 NGS 技术是现代生物学研究的关键参与者,但分析和解释产生的大量数据并不是一项简单或微不足道的任务,在生物信息学领域仍然是一个巨大的挑战。因此,需要有效的工具来应对信息过载、处理高复杂性并提供有意义的可视化效果,以更轻松地提取知识。在本文中,我们简要介绍了用于这些分析的测序方法和可用设备,并描述了它们产生的文件的数据格式。最后,我们对为有效地存储、分析和可视化此类数据而开发的工具进行了全面审查,重点介绍了结构变异分析和比较基因组学。我们最后评论了它们的功能、优势和劣势,并讨论了未来的应用程序如何在该领域进一步发展。