Robert Koch Institute, Nordufer 20, Berlin, 13353, Germany.
Robert Koch Institute, Wernigerode Branch, Burgstraße 37, Wernigerode, 38855, Germany.
BMC Genomics. 2018 Jan 15;19(1):47. doi: 10.1186/s12864-017-4401-3.
The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single reference sequence and a set of variations cannot represent their full diversity and introduces bias towards the chosen reference. There is a need for the representation of multiple sequences in a composite way that is compatible with existing data sources for annotation and suitable for established sequence analysis methods. At the same time, this representation needs to be easily accessible and extendable to account for the constant change of available genomes.
We introduce seq-seq-pan, a framework that provides methods for adding or removing new genomes from a set of aligned genomes and uses these to construct a whole genome alignment. Throughout the sequential workflow the alignment is optimized for generating a representative linear presentation of the aligned set of genomes, that enables its usage for annotation and in downstream analyses.
By providing dynamic updates and optimized processing, our approach enables the usage of whole genome alignment in the field of pan-genomics. In addition, the sequential workflow can be used as a fast alternative to existing whole genome aligners for aligning closely related genomes. seq-seq-pan is freely available at https://gitlab.com/rki_bioinformatics.
下一代测序技术的应用越来越广泛,导致出现了数千个参考基因组,通常为同一或密切相关的物种提供多个基因组。目前,用单一参考序列和一组变体来表示一个物种或一个种群的方法不能代表它们的全部多样性,并对所选参考序列产生偏差。需要以一种与现有注释数据源兼容且适用于现有序列分析方法的组合方式来表示多个序列。同时,这种表示形式需要易于访问和扩展,以适应可用基因组的不断变化。
我们引入了 seq-seq-pan,这是一个框架,提供了从一组对齐基因组中添加或删除新基因组的方法,并使用这些方法构建全基因组对齐。在整个顺序工作流程中,对齐被优化为生成对齐基因组集的代表性线性表示,从而能够将其用于注释和下游分析。
通过提供动态更新和优化处理,我们的方法使全基因组对齐能够在泛基因组学领域得到应用。此外,该顺序工作流程可用作现有全基因组比对器的快速替代方法,用于对齐密切相关的基因组。seq-seq-pan 可在 https://gitlab.com/rki_bioinformatics 上免费获得。