Suppr超能文献

病毒序列长度以及可变和信息位点数量在HIV聚类分析中的重要性

Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

作者信息

Novitsky Vlad, Moyo Sikhulile, Lei Quanhong, DeGruttola Victor, Essex M

机构信息

1 Harvard School of Public Health AIDS Initiative, Department of Immunology and Infectious Diseases, Harvard School of Public Health , Boston, Massachusetts.

出版信息

AIDS Res Hum Retroviruses. 2015 May;31(5):531-42. doi: 10.1089/AID.2014.0211. Epub 2015 Feb 6.

Abstract

To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

摘要

为改进HIV聚类分析方法,我们探讨了HIV聚类分析与可能影响病毒聚类结果的参数之间的关联。在从洛斯阿拉莫斯国家实验室(LANL)HIV数据库检索的401个HIV-1C近全长基因组序列和亚基因组区域之间,比较了HIV聚类程度和树形确定性。滑动窗口分析基于99个1000bp的窗口和45个2000bp的窗口。评估了HIV聚类程度与序列长度以及可变位点和信息位点数量之间的潜在关联。近全长基因组HIV序列显示出最高的HIV聚类程度和最高的树形确定性。在最大似然(ML)分析中,自展阈值为0.80时,58.9%的近全长HIV-1C序列位于聚类中,但部分pol序列(ViroSeq)中只有15.5%位于聚类中。在HIV-1结构基因中,pol显示出最高的聚类程度(自展阈值为0.80时为38.9%),尽管显著低于近全长基因组序列。2000bp的滑动窗口的HIV聚类程度显著高于1000bp的滑动窗口。我们发现序列长度与聚类中HIV序列比例之间存在强关联,可变位点和信息位点数量与聚类中HIV序列比例之间存在中度关联。在HIV聚类分析中,可检测到的HIV聚类程度与所用病毒序列的长度以及可变位点和信息位点的数量直接相关。近全长基因组序列可为HIV聚类分析提供最丰富的信息。具有高HIV聚类程度和高树形确定性的选定亚基因组区域也可作为第二选择。

相似文献

2
Impact of sampling density on the extent of HIV clustering.采样密度对HIV聚集程度的影响。
AIDS Res Hum Retroviruses. 2014 Dec;30(12):1226-35. doi: 10.1089/aid.2014.0173.
3
TreeCluster: Clustering biological sequences using phylogenetic trees.TreeCluster:使用系统发生树进行生物序列聚类。
PLoS One. 2019 Aug 22;14(8):e0221068. doi: 10.1371/journal.pone.0221068. eCollection 2019.
4
Identifying Transmission Clusters with Cluster Picker and HIV-TRACE.使用聚类选择器和HIV-TRACE识别传播集群。
AIDS Res Hum Retroviruses. 2017 Mar;33(3):211-218. doi: 10.1089/AID.2016.0205. Epub 2016 Dec 13.

引用本文的文献

10
Recent advances in understanding HIV evolution.艾滋病病毒进化研究的最新进展
F1000Res. 2017 Apr 28;6:597. doi: 10.12688/f1000research.10876.1. eCollection 2017.

本文引用的文献

2
Impact of sampling density on the extent of HIV clustering.采样密度对HIV聚集程度的影响。
AIDS Res Hum Retroviruses. 2014 Dec;30(12):1226-35. doi: 10.1089/aid.2014.0173.
10
The global transmission network of HIV-1.HIV-1 的全球传播网络。
J Infect Dis. 2014 Jan 15;209(2):304-13. doi: 10.1093/infdis/jit524. Epub 2013 Oct 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验