• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用全基因组进行HIV-1亚型分型中的核苷酸组成字符串选择

Nucleotide composition string selection in HIV-1 subtyping using whole genomes.

作者信息

Wu Xiaomeng, Cai Zhipeng, Wan Xiu-Feng, Hoang Tin, Goebel Randy, Lin Guohui

机构信息

Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.

出版信息

Bioinformatics. 2007 Jul 15;23(14):1744-52. doi: 10.1093/bioinformatics/btm248. Epub 2007 May 11.

DOI:10.1093/bioinformatics/btm248
PMID:17495995
Abstract

MOTIVATION

The availability of the whole genomic sequences of HIV-1 viruses provides an excellent resource for studying the HIV-1 phylogenies using all the genetic materials. However, such huge volumes of data create computational challenges in both memory consumption and CPU usage.

RESULTS

We propose the complete composition vector representation for an HIV-1 strain, and a string scoring method to extract the nucleotide composition strings that contain the richest evolutionary information for phylogenetic analysis. In this way, a large-scale whole genome phylogenetic analysis for thousands of strains can be done both efficiently and effectively. By using 42 carefully curated strains as references, we apply our method to subtype 1156 HIV-1 strains (10.5 million nucleotides in total), which include 825 pure subtype strains and 331 recombinants. Our results show that our nucleotide composition string selection scheme is computationally efficient, and is able to define both pure subtypes and recombinant forms for HIV-1 strains using the 5000 top ranked nucleotide strings.

AVAILABILITY

The Java executable and the HIV-1 datasets are accessible through 'http://www.cs.ualberta.ca/~ghlin/src/WebTools/hiv.php.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

HIV-1病毒全基因组序列的可得性为利用所有遗传物质研究HIV-1系统发育提供了极好的资源。然而,如此大量的数据在内存消耗和CPU使用方面都带来了计算挑战。

结果

我们提出了一种针对HIV-1毒株的完整组成向量表示法,以及一种字符串评分方法,以提取包含用于系统发育分析的最丰富进化信息的核苷酸组成字符串。通过这种方式,可以高效且有效地对数千个毒株进行大规模全基因组系统发育分析。我们以42个经过精心挑选的毒株作为参考,将我们的方法应用于1156个HIV-1毒株(总共1050万个核苷酸),其中包括825个纯亚型毒株和331个重组毒株。我们的结果表明,我们的核苷酸组成字符串选择方案在计算上是高效的,并且能够使用排名前5000的核苷酸字符串来定义HIV-1毒株的纯亚型和重组形式。

可用性

Java可执行文件和HIV-1数据集可通过“http://www.cs.ualberta.ca/~ghlin/src/WebTools/hiv.php”获取。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
Nucleotide composition string selection in HIV-1 subtyping using whole genomes.使用全基因组进行HIV-1亚型分型中的核苷酸组成字符串选择
Bioinformatics. 2007 Jul 15;23(14):1744-52. doi: 10.1093/bioinformatics/btm248. Epub 2007 May 11.
2
Genetic characterization of complex inter-recombinant HIV-1 strains circulating in Spain and reliability of distinct rapid subtyping tools.在西班牙流行的复杂重组型HIV-1毒株的基因特征及不同快速亚型分型工具的可靠性
J Med Virol. 2008 Mar;80(3):383-91. doi: 10.1002/jmv.21105.
3
Robust inference of positive selection from recombining coding sequences.从重组编码序列中进行正向选择的稳健推断。
Bioinformatics. 2006 Oct 15;22(20):2493-9. doi: 10.1093/bioinformatics/btl427. Epub 2006 Aug 7.
4
Whole genome composition distance for HIV-1 genotyping.
Comput Syst Bioinformatics Conf. 2006:179-90.
5
Large-scale genome clustering across life based on a linguistic approach.基于语言方法的跨生命的大规模基因组聚类
Biosystems. 2005 Sep;81(3):208-22. doi: 10.1016/j.biosystems.2005.04.003.
6
Phylogenetic understanding of clonal populations in an era of whole genome sequencing.全基因组测序时代克隆群体的系统发育理解
Infect Genet Evol. 2009 Sep;9(5):1010-9. doi: 10.1016/j.meegid.2009.05.014. Epub 2009 May 27.
7
The phylogenetic information profile of HIV-1 and the degradation effect of recombination.
Infect Genet Evol. 2008 Mar;8(2):139-45. doi: 10.1016/j.meegid.2007.11.002. Epub 2007 Nov 17.
8
[Primers of gag gene for HIV-1 subtyping in China and application thereof in practice].[用于中国HIV-1基因亚型分型的gag基因引物及其实际应用]
Zhonghua Yi Xue Za Zhi. 2009 Apr 7;89(13):876-80.
9
Evolution at the nucleotide level: the problem of multiple whole-genome alignment.核苷酸水平上的进化:多基因组比对问题。
Hum Mol Genet. 2006 Apr 15;15 Spec No 1:R51-6. doi: 10.1093/hmg/ddl056.
10
Genetic characterization of HIV-1 BC recombinants and evolutionary history of the CRF31_BC in Southern Brazil.巴西南部HIV-1 BC重组体的基因特征及CRF31_BC的进化史
Infect Genet Evol. 2009 Jul;9(4):474-82. doi: 10.1016/j.meegid.2009.01.008. Epub 2009 Jan 30.

引用本文的文献

1
: indexing maximal common subsequences for k strings.为k个字符串索引最大公共子序列
Algorithms Mol Biol. 2025 Apr 19;20(1):6. doi: 10.1186/s13015-025-00271-z.
2
phyBWT2: phylogeny reconstruction via eBWT positional clustering.phyBWT2:通过增强型Burrows-Wheeler变换位置聚类进行系统发育重建
Algorithms Mol Biol. 2023 Aug 3;18(1):11. doi: 10.1186/s13015-023-00232-4.
3
An Information-Entropy Position-Weighted -Mer Relative Measure for Whole Genome Phylogeny Reconstruction.一种用于全基因组系统发育重建的信息熵位置加权-mer相对度量
Front Genet. 2021 Oct 22;12:766496. doi: 10.3389/fgene.2021.766496. eCollection 2021.
4
Phylogenetic Analysis of HIV-1 Genomes Based on the Position-Weighted K-mers Method.基于位置加权k-mer方法的HIV-1基因组系统发育分析
Entropy (Basel). 2020 Feb 23;22(2):255. doi: 10.3390/e22020255.
5
Detection of Microaneurysms in Fundus Images Based on an Attention Mechanism.基于注意力机制的眼底图像微动脉瘤检测。
Genes (Basel). 2019 Oct 17;10(10):817. doi: 10.3390/genes10100817.
6
ZCMM: A Novel Method Using Z-Curve Theory- Based and Position Weight Matrix for Predicting Nucleosome Positioning.ZCMM:一种基于 Z 曲线理论和位置权重矩阵的预测核小体定位的新方法。
Genes (Basel). 2019 Sep 28;10(10):765. doi: 10.3390/genes10100765.
7
SWSPM: A Novel Alignment-Free DNA Comparison Method Based on Signal Processing Approaches.SWSPM:一种基于信号处理方法的新型无比对DNA比较方法。
Evol Bioinform Online. 2019 May 30;15:1176934319849071. doi: 10.1177/1176934319849071. eCollection 2019.
8
DIME: a novel framework for de novo metagenomic sequence assembly.DIME:一种用于从头宏基因组序列组装的新型框架。
J Comput Biol. 2015 Feb;22(2):159-77. doi: 10.1089/cmb.2014.0251.
9
Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein.在蛋白质的物理化学特性层面理解HA亚型分型的潜在机制。
PLoS One. 2014 May 8;9(5):e96984. doi: 10.1371/journal.pone.0096984. eCollection 2014.
10
Prokaryotic phylogenies inferred from whole-genome sequence and annotation data.基于全基因组序列和注释数据推断的原核生物系统发育。
Biomed Res Int. 2013;2013:409062. doi: 10.1155/2013/409062. Epub 2013 Aug 29.