Brief Bioinform. 2018 Jan 1;19(1):118-135. doi: 10.1093/bib/bbw089.
Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.
许多学科,从人类遗传学和肿瘤学到植物育种、微生物学和病毒学,通常都面临着分析数量迅速增加的基因组的挑战。就智人而言,在未来几年内,测序的基因组数量将接近数十万。仅仅扩大现有的生物信息学管道将不足以充分利用这些丰富基因组数据集的全部潜力。相反,需要新颖的、性质不同的计算方法和范例。我们将见证计算泛基因组学的快速扩展,这是计算生物学中的一个新的研究领域。在本文中,我们推广了现有的定义,并将泛基因组理解为任何要联合分析或用作参考的基因组序列集合。我们研究了已经存在的构建和使用泛基因组的方法,讨论了未来技术和方法的潜在好处,并从上述生物学学科的角度回顾了开放的挑战。作为计算范例转变的一个突出例子,我们特别强调了从将参考基因组表示为字符串到表示为图的转变。我们概述了这种转变以及来自不同应用领域的其他挑战如何转化为常见的计算问题,指出相关的生物信息学技术并确定计算机科学中的开放问题。通过这篇综述,我们旨在提高人们的认识,即计算泛基因组学的联合方法可以帮助解决目前在各个领域面临的许多问题。