NGlyAlign:一种自动化文库构建工具,用于对齐高度变异的 HIV 包膜序列。

NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences.

机构信息

School of Mathematics and Statistics, UNSW, Sydney, NSW, Australia.

出版信息

BMC Bioinformatics. 2021 Feb 8;22(1):54. doi: 10.1186/s12859-020-03901-y.

Abstract

BACKGROUND

The high variability in envelope regions of some viruses such as HIV allow the virus to establish infection and to escape subsequent immune surveillance. This variability, as well as increasing incorporation of N-linked glycosylation sites, is fundamental to this evasion. It also creates difficulties for multiple sequence alignment methods (MSA) that provide the first step in their analysis. Existing MSA tools often fail to properly align highly variable HIV envelope sequences requiring extensive manual editing that is impractical with even a moderate number of these variable sequences.

RESULTS

We developed an automated library building tool NGlyAlign, that organizes similar N-linked glycosylation sites as block constraints and statistically conserved global sites as single site constraints to automatically enforce partial columns in consistency-based MSA methods such as Dialign. This combined method accurately aligns variable HIV-1 envelope sequences. We tested the method on two datasets: a set of 156 founder and chronic gp160 HIV-1 subtype B sequences as well as a set of reference sequences of gp120 in the highly variable region 1. On measures such as entropy scores, sum of pair scores, column score, and similarity heat maps, NGlyAlign+Dialign proved superior against methods such as T-Coffee, ClustalOmega, ClustalW, Praline, HIValign and Muscle. The method is scalable to large sequence sets producing accurate alignments without requiring manual editing. As well as this application to HIV, our method can be used for other highly variable glycoproteins such as hepatitis C virus envelope.

CONCLUSIONS

NGlyAlign is an automated tool for mapping and building glycosylation motif libraries to accurately align highly variable regions in HIV sequences. It can provide the basis for many studies reliant on single robust alignments. NGlyAlign has been developed as an open-source tool and is freely available at https://github.com/UNSW-Mathematical-Biology/NGlyAlign_v1.0 .

摘要

背景

某些病毒(如 HIV)的包膜区域具有高度变异性,这使病毒能够建立感染并逃避随后的免疫监测。这种变异性以及 N 连接糖基化位点的不断增加,是这种逃避的基础。它也给多序列比对方法(MSA)带来了困难,这些方法提供了分析的第一步。现有的 MSA 工具往往无法正确对齐高度可变的 HIV 包膜序列,这需要进行大量的手动编辑,即使是对于数量适中的可变序列,这也是不切实际的。

结果

我们开发了一种自动化库构建工具 NGlyAlign,它将相似的 N 连接糖基化位点组织为块约束,并将统计上保守的全局位点组织为单个位点约束,以自动强制一致性 MSA 方法(如 Dialign)中的部分列。这种组合方法可以准确地对齐可变的 HIV-1 包膜序列。我们在两个数据集上测试了该方法:一组 156 个创始人和慢性 gp160 HIV-1 亚型 B 序列,以及一组高度可变区 1 中的 gp120 参考序列。在熵得分、对得分总和、列得分和相似性热图等指标上,NGlyAlign+Dialign 优于 T-Coffee、ClustalOmega、ClustalW、Praline、HIValign 和 Muscle 等方法。该方法可扩展到大型序列集,无需手动编辑即可生成准确的对齐。除了在 HIV 中的应用外,我们的方法还可用于其他高度可变的糖蛋白,如丙型肝炎病毒包膜。

结论

NGlyAlign 是一种自动化工具,用于映射和构建糖基化模体库,以准确对齐 HIV 序列中的高度可变区域。它可以为许多依赖于单个稳健对齐的研究提供基础。NGlyAlign 已作为开源工具开发,并可在 https://github.com/UNSW-Mathematical-Biology/NGlyAlign_v1.0 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b820/7869453/dbc0d8b49197/12859_2020_3901_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索