计算平均核苷酸一致性的算法的大规模评估。

A large-scale evaluation of algorithms to calculate average nucleotide identity.

作者信息

Yoon Seok-Hwan, Ha Sung-Min, Lim Jeongmin, Kwon Soonjae, Chun Jongsik

机构信息

School of Biological Sciences & Institute of Molecular Biology & Genetics, Seoul National University, Seoul, 151-742, Republic of Korea.

ChunLab, Inc., Seoul National University, Seoul, 151-742, Republic of Korea.

出版信息

Antonie Van Leeuwenhoek. 2017 Oct;110(10):1281-1286. doi: 10.1007/s10482-017-0844-4. Epub 2017 Feb 15.

DOI:10.1007/s10482-017-0844-4

PMID:28204908

Abstract

Average nucleotide identity (ANI) is a category of computational analysis that can be used to define species boundaries of Archaea and Bacteria. Calculating ANI usually involves the fragmentation of genome sequences, followed by nucleotide sequence search, alignment, and identity calculation. The original algorithm to calculate ANI used the BLAST program as its search engine. An improved ANI algorithm, called OrthoANI, was developed to accommodate the concept of orthology. Here, we compared four algorithms to compute ANI, namely ANIb (ANI algorithm using BLAST), ANIm (ANI using MUMmer), OrthoANIb (OrthoANI using BLAST) and OrthoANIu (OrthoANI using USEARCH) using >100,000 pairs of genomes with various genome sizes. By comparing values to the ANIb that is considered a standard, OrthoANIb and OrthoANIu exhibited good correlation in the whole range of ANI values. ANIm showed poor correlation for ANI of <90%. ANIm and OrthoANIu runs faster than ANIb by an order of magnitude. When genomes that are larger than 7 Mbp were analysed, the run-times of ANIm and OrthoANIu were shorter than that of ANIb by 53- and 22-fold, respectively. In conclusion, ANI calculation can be greatly sped up by the OrthoANIu method without losing accuracy. A web-service that can be used to calculate OrthoANIu between a pair of genome sequences is available at http://www.ezbiocloud.net/tools/ani . For large-scale calculation and integration in bioinformatics pipelines, a standalone JAVA program is available for download at http://www.ezbiocloud.net/tools/orthoaniu .

摘要

平均核苷酸同一性（ANI）是一种计算分析类别，可用于界定古菌和细菌的物种界限。计算ANI通常涉及基因组序列的片段化，随后进行核苷酸序列搜索、比对和同一性计算。最初计算ANI的算法使用BLAST程序作为其搜索引擎。一种改进的ANI算法，即正交ANI（OrthoANI）被开发出来以适应直系同源的概念。在此，我们使用超过100,000对具有不同基因组大小的基因组，比较了四种计算ANI的算法，即ANIb（使用BLAST的ANI算法）、ANIm（使用MUMmer的ANI）、OrthoANIb（使用BLAST的OrthoANI）和OrthoANIu（使用USEARCH的OrthoANI）。通过将这些值与被视为标准的ANIb进行比较，OrthoANIb和OrthoANIu在整个ANI值范围内表现出良好的相关性。对于ANI < 90% 的情况，ANIm显示出较差的相关性。ANIm和OrthoANIu的运行速度比ANIb快一个数量级。当分析大于7 Mbp的基因组时，ANIm和OrthoANIu的运行时间分别比ANIb短53倍和22倍。总之，使用OrthoANIu方法可以在不损失准确性的情况下大大加快ANI的计算速度。可通过http://www.ezbiocloud.net/tools/ani访问一个可用于计算一对基因组序列之间OrthoANIu的网络服务。对于生物信息学管道中的大规模计算和整合，可从http://www.ezbiocloud.net/tools/orthoaniu下载一个独立的JAVA程序。