Durbin Richard, De Sanctis Bianca, Blumer Moritz
Department of Genetics, University of Cambridge, Cambridge, England, CB2 3EH, UK.
Wellcome Open Res. 2023 Sep 13;8:401. doi: 10.12688/wellcomeopenres.19568.1. eCollection 2023.
Sequences derived from circular DNA molecules (i.e. most bacterial, viral and plastid genomes) are expected to be linearised and rotated to a common start position for most downstream analyses including alignment. Despite this being a common and straightforward task, available software is either limited to a small number of input sequences, lacks the option to specify a custom anchor string, or requires a commercial license. Here, we present rotate, a simple, open source command line program written in C with no external dependencies, which can rotate a set of input sequences to a custom anchor string (allowing for a specified number of mismatches), or offset the input sequences to the desired position. The combination of both functionalities allows the rotation of all input sequences to any desired starting position, enabling downstream analysis. rotate is extremely fast and scales linearly with the number of input sequences, taking only seconds to rotate over a thousand mitochondrial sequences.
源自环状DNA分子(即大多数细菌、病毒和质体基因组)的序列,在包括比对在内的大多数下游分析中,都需要进行线性化处理并旋转至共同的起始位置。尽管这是一项常见且简单的任务,但现有的软件要么仅限于处理少量输入序列,缺乏指定自定义锚定字符串的选项,要么需要商业许可证。在此,我们展示了rotate,这是一个用C语言编写的简单开源命令行程序,无外部依赖项,它可以将一组输入序列旋转至自定义锚定字符串(允许指定一定数量的错配),或者将输入序列偏移到所需位置。这两种功能的结合使得所有输入序列都能旋转到任何所需的起始位置,从而便于进行下游分析。rotate速度极快,并且随输入序列数量呈线性扩展,只需几秒钟就能旋转一千多个线粒体序列。