United States Military HIV Research Program, Rockville, Maryland, USA.
PLoS One. 2010 Jun 1;5(6):e10829. doi: 10.1371/journal.pone.0010829.
The W-curve was originally developed as a graphical visualization technique for viewing DNA and RNA sequences. Its ability to render features of DNA also makes it suitable for computational studies. Its main advantage in this area is utilizing a single-pass algorithm for comparing the sequences. Avoiding recursion during sequence alignments offers advantages for speed and in-process resources. The graphical technique also allows for multiple models of comparison to be used depending on the nucleotide patterns embedded in similar whole genomic sequences. The W-curve approach allows us to compare large numbers of samples quickly.
We are currently tuning the algorithm to accommodate quirks specific to HIV-1 genomic sequences so that it can be used to aid in diagnostic and vaccine efforts. Tracking the molecular evolution of the virus has been greatly hampered by gap associated problems predominantly embedded within the envelope gene of the virus. Gaps and hypermutation of the virus slow conventional string based alignments of the whole genome. This paper describes the W-curve algorithm itself, and how we have adapted it for comparison of similar HIV-1 genomes. A treebuilding method is developed with the W-curve that utilizes a novel Cylindrical Coordinate distance method and gap analysis method. HIV-1 C2-V5 env sequence regions from a Mother/Infant cohort study are used in the comparison.
The output distance matrix and neighbor results produced by the W-curve are functionally equivalent to those from Clustal for C2-V5 sequences in the mother/infant pairs infected with CRF01_AE.
Significant potential exists for utilizing this method in place of conventional string based alignment of HIV-1 genomes, such as Clustal X. With W-curve heuristic alignment, it may be possible to obtain clinically useful results in a short time-short enough to affect clinical choices for acute treatment. A description of the W-curve generation process, including a comparison technique of aligning extremes of the curves to effectively phase-shift them past the HIV-1 gap problem, is presented. Besides yielding similar neighbor-joining phenogram topologies, most Mother and Infant C2-V5 sequences in the cohort pairs geometrically map closest to each other, indicating that W-curve heuristics overcame any gap problem.
W 曲线最初是作为一种图形化的可视化技术,用于查看 DNA 和 RNA 序列。它呈现 DNA 特征的能力也使其适用于计算研究。它在这一领域的主要优势在于使用单遍算法来比较序列。在序列比对过程中避免递归提供了速度和进程资源方面的优势。图形技术还允许根据相似全基因组序列中嵌入的核苷酸模式使用多种比较模型。W 曲线方法允许我们快速比较大量样本。
我们目前正在调整算法,以适应 HIV-1 基因组序列的特定怪癖,以便它可以用于辅助诊断和疫苗研究。病毒分子进化的跟踪受到主要嵌入病毒包膜基因中的缺口相关问题的极大阻碍。病毒的缺口和超突变使整个基因组的常规基于字符串的比对变得缓慢。本文描述了 W 曲线算法本身,以及我们如何对其进行调整以比较相似的 HIV-1 基因组。开发了一种基于 W 曲线的建树方法,该方法利用新颖的圆柱坐标距离方法和缺口分析方法。在比较中使用了来自母婴队列研究的 HIV-1 C2-V5 env 序列区域。
W 曲线生成的输出距离矩阵和邻居结果在功能上等效于 Clustal 对感染 CRF01_AE 的母婴对中的 C2-V5 序列的结果。
在 HIV-1 基因组的常规基于字符串的比对中,如 Clustal X 中,利用这种方法具有很大的潜力。通过 W 曲线启发式对齐,有可能在短时间内获得临床有用的结果——足够短,以影响急性治疗的临床选择。本文介绍了 W 曲线生成过程的描述,包括一种对齐曲线极端的比较技术,有效地将它们相位移动到 HIV-1 缺口问题之外,同时还生成了相似的邻居连接phenogram 拓扑结构,队列对中的大多数母婴 C2-V5 序列在几何上彼此最接近,这表明 W 曲线启发式方法克服了任何缺口问题。