IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, Bozetechova 2, 612 66, Brno, Czechia.
BMC Bioinformatics. 2021 May 20;22(1):258. doi: 10.1186/s12859-021-04177-6.
The insertion sequence elements (IS elements) represent the smallest and the most abundant mobile elements in prokaryotic genomes. It has been shown that they play a significant role in genome organization and evolution. To better understand their function in the host genome, it is desirable to have an effective detection and annotation tool. This need becomes even more crucial when considering rapid-growing genomic and metagenomic data. The existing tools for IS elements detection and annotation are usually based on comparing sequence similarity with a database of known IS families. Thus, they have limited ability to discover distant and putative novel IS elements.
In this paper, we present digIS, a software tool based on profile hidden Markov models assembled from catalytic domains of transposases. It shows a very good performance in detecting known IS elements when tested on datasets with manually curated annotation. The main contribution of digIS is in its ability to detect distant and putative novel IS elements while maintaining a moderate level of false positives. In this category it outperforms existing tools, especially when tested on large datasets of archaeal and bacterial genomes.
We provide digIS, a software tool using a novel approach based on manually curated profile hidden Markov models, which is able to detect distant and putative novel IS elements. Although digIS can find known IS elements as well, we expect it to be used primarily by scientists interested in finding novel IS elements. The tool is available at https://github.com/janka2012/digIS.
插入序列元件(IS 元件)是原核基因组中最小、最丰富的可移动元件。已经表明它们在基因组组织和进化中起着重要作用。为了更好地理解它们在宿主基因组中的功能,需要有一种有效的检测和注释工具。当考虑到快速增长的基因组和宏基因组数据时,这种需求变得更加关键。现有的 IS 元件检测和注释工具通常基于与已知 IS 家族数据库的序列相似性比较。因此,它们发现远距离和假定的新 IS 元件的能力有限。
在本文中,我们提出了 digIS,这是一种基于从转座酶催化结构域组装的轮廓隐马尔可夫模型的软件工具。当在具有手动注释的数据集上进行测试时,它在检测已知 IS 元件方面表现出非常好的性能。digIS 的主要贡献在于它能够检测远距离和假定的新 IS 元件,同时保持适度的假阳性率。在这一类中,它优于现有的工具,特别是在测试大型古细菌和细菌基因组数据集时。
我们提供了 digIS,这是一种使用基于手动注释的轮廓隐马尔可夫模型的新方法的软件工具,它能够检测远距离和假定的新 IS 元件。虽然 digIS 也可以找到已知的 IS 元件,但我们期望它主要被对寻找新 IS 元件感兴趣的科学家使用。该工具可在 https://github.com/janka2012/digIS 上获得。