Rand Knut D, Grytten Ivar, Nederbragt Alexander J, Storvik Geir O, Glad Ingrid K, Sandve Geir K
Department of Mathematics, University of Oslo, Moltke Moes vei 35, Oslo, 0851, Norway.
Department of informatics, University of Oslo, Gaustadalleen 23 B, Oslo, 0371, Norway.
BMC Bioinformatics. 2017 May 18;18(1):263. doi: 10.1186/s12859-017-1678-9.
It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph-based reference genomes.
We formalize offset-based coordinate systems on graph-based reference genomes and introduce methods for representing intervals on these reference structures. We show the advantage of our methods by representing genes on a graph-based representation of the newest assembly of the human genome (GRCh38) and its alternative loci for regions that are highly variable.
More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of the GRCh38 assembly and potential future graph-based reference genomes. We have made a Python package for representing such intervals on offset-based coordinate systems, available at https://github.com/uio-cels/offsetbasedgraph . An interactive web-tool using this Python package to visualize genes on a graph created from GRCh38 is available at https://github.com/uio-cels/genomicgraphcoords .
有人提出,未来的参考基因组应采用图结构,以便更好地表示一个物种中存在的序列多样性。然而,目前尚无标准方法在基于图的参考基因组上表示基因组区间,如基因位置或转录因子结合位点。
我们在基于图的参考基因组上形式化了基于偏移量的坐标系,并介绍了在这些参考结构上表示区间的方法。我们通过在基于图的人类基因组最新组装版本(GRCh38)及其高度可变区域的替代位点上表示基因,展示了我们方法的优势。
包含替代位点的更复杂参考基因组需要在这些结构上表示基因组数据的方法。我们提出的基因组区间表示法使得充分利用GRCh38组装版本的替代位点以及未来可能的基于图的参考基因组成为可能。我们已经制作了一个用于在基于偏移量的坐标系上表示此类区间的Python包,可在https://github.com/uio-cels/offsetbasedgraph获取。一个使用此Python包在由GRCh38创建的图上可视化基因的交互式网络工具可在https://github.com/uio-cels/genomicgraphcoords获取。