Didier Gilles, Debomy Laurent, Pupin Maude, Zhang Ming, Grossmann Alexander, Devauchelle Claudine, Laprevotte Ivan
Institut Mathématique de Luminy, UMR 6206, Campus de Luminy, Case 907, 13288 Marseille Cedex 9, France.
BMC Bioinformatics. 2007 Jan 2;8:1. doi: 10.1186/1471-2105-8-1.
In general, the construction of trees is based on sequence alignments. This procedure, however, leads to loss of informationwhen parts of sequence alignments (for instance ambiguous regions) are deleted before tree building. To overcome this difficulty, one of us previously introduced a new and rapid algorithm that calculates dissimilarity matrices between sequences without preliminary alignment.
In this paper, HIV (Human Immunodeficiency Virus) and SIV (Simian Immunodeficiency Virus) sequence data are used to evaluate this method. The program produces tree topologies that are identical to those obtained by a combination of standard methods detailed in the HIV Sequence Compendium. Manual alignment editing is not necessary at any stage. Furthermore, only one user-specified parameter is needed for constructing trees.
The extensive tests on HIV/SIV subtyping showed that the virus classifications produced by our method are in good agreement with our best taxonomic knowledge, even in non-coding LTR (Long Terminal Repeat) regions that are not tractable by regular alignment methods due to frequent duplications/insertions/deletions. Our method, however, is not limited to the HIV/SIV subtyping. It provides an alternative tree construction without a time-consuming aligning procedure.
一般来说,树的构建是基于序列比对的。然而,当在构建树之前删除部分序列比对(例如模糊区域)时,这个过程会导致信息丢失。为了克服这一困难,我们中的一人之前引入了一种新的快速算法,该算法无需预先比对就能计算序列之间的差异矩阵。
在本文中,使用了HIV(人类免疫缺陷病毒)和SIV(猴免疫缺陷病毒)序列数据来评估该方法。该程序生成的树拓扑结构与《HIV序列汇编》中详细介绍的标准方法组合所获得的拓扑结构相同。在任何阶段都无需手动比对编辑。此外,构建树只需要一个用户指定的参数。
对HIV/SIV亚型的广泛测试表明,我们的方法所产生的病毒分类与我们最好的分类学知识高度一致,即使在由于频繁重复/插入/缺失而无法用常规比对方法处理的非编码LTR(长末端重复序列)区域也是如此。然而,我们的方法并不局限于HIV/SIV亚型分析。它提供了一种无需耗时比对过程的替代树构建方法。