Struck Daniel, Lawyer Glenn, Ternes Anne-Marie, Schmit Jean-Claude, Bercoff Danielle Perez
Laboratory of Retrovirology, CRP-Santé, 84, Val Fleuri, L-1526, Luxembourg
Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Campus E1 4, 66123 Saarbrücken, Germany.
Nucleic Acids Res. 2014 Oct;42(18):e144. doi: 10.1093/nar/gku739. Epub 2014 Aug 12.
Viral sequence classification has wide applications in clinical, epidemiological, structural and functional categorization studies. Most existing approaches rely on an initial alignment step followed by classification based on phylogenetic or statistical algorithms. Here we present an ultrafast alignment-free subtyping tool for human immunodeficiency virus type one (HIV-1) adapted from Prediction by Partial Matching compression. This tool, named COMET, was compared to the widely used phylogeny-based REGA and SCUEAL tools using synthetic and clinical HIV data sets (1,090,698 and 10,625 sequences, respectively). COMET's sensitivity and specificity were comparable to or higher than the two other subtyping tools on both data sets for known subtypes. COMET also excelled in detecting and identifying new recombinant forms, a frequent feature of the HIV epidemic. Runtime comparisons showed that COMET was almost as fast as USEARCH. This study demonstrates the advantages of alignment-free classification of viral sequences, which feature high rates of variation, recombination and insertions/deletions. COMET is free to use via an online interface.
病毒序列分类在临床、流行病学、结构和功能分类研究中有着广泛应用。大多数现有方法依赖于初始比对步骤,然后基于系统发育或统计算法进行分类。在此,我们展示了一种超快速的、无需比对的人类免疫缺陷病毒1型(HIV-1)亚型分析工具,该工具改编自基于部分匹配压缩的预测方法。这个名为COMET的工具,使用合成和临床HIV数据集(分别为1,090,698和10,625个序列),与广泛使用的基于系统发育的REGA和SCUEAL工具进行了比较。对于已知亚型,在两个数据集上,COMET的灵敏度和特异性与其他两种亚型分析工具相当或更高。COMET在检测和识别新的重组形式方面也表现出色,而重组是HIV流行的一个常见特征。运行时比较表明,COMET几乎与USEARCH一样快。这项研究证明了对具有高变异率、重组率和插入/缺失率的病毒序列进行无需比对分类的优势。可通过在线界面免费使用COMET。