• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

研究用于HIV-1亚型分类的无比对机器学习方法。

Investigating alignment-free machine learning methods for HIV-1 subtype classification.

作者信息

Wade Kaitlyn E, Chen Lianghong, Deng Chutong, Zhou Gen, Hu Pingzhao

机构信息

Department of Computer Science, University of Western Ontario, London, ON N6A 3K7, Canada.

Department of Biochemistry, University of Western Ontario, London, ON N6A 3K7, Canada.

出版信息

Bioinform Adv. 2024 Jul 29;4(1):vbae108. doi: 10.1093/bioadv/vbae108. eCollection 2024.

DOI:10.1093/bioadv/vbae108
PMID:39228995
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11371153/
Abstract

MOTIVATION

Many viruses are organized into taxonomies of subtypes based on their genetic similarities. For human immunodeficiency virus 1 (HIV-1), subtype classification plays a crucial role in infection management. Sequence alignment-based methods for subtype classification are impractical for large datasets because they are costly and time-consuming. Alignment-free methods involve creating numerical representations for genetic sequences and applying statistical or machine learning methods. Despite their high overall accuracy, existing models perform poorly on less common subtypes. Furthermore, there is limited work investigating the impact of sequence vectorization methods, in particular natural language-inspired embedding methods, on HIV-1 subtype classification.

RESULTS

We present a comprehensive analysis of sequence vectorization methods across machine learning methods. We report a -mer-based XGBoost model with a balanced accuracy of 0.84, indicating that it has good overall performance for both common and uncommon HIV-1 subtypes. We also report a Word2Vec-based support vector machine that achieves promising results on precision and balanced accuracy. Our study sheds light on the effect of sequence vectorization methods on HIV-1 subtype classification and suggests that natural language-inspired encoding methods show promise. Our results could help to develop improved HIV-1 subtype classification methods, leading to improved individual patient outcomes, and the development of subtype-specific treatments.

AVAILABILITY AND IMPLEMENTATION

Source code is available at https://www.github.com/kwade4/HIV_Subtypes.

摘要

动机

许多病毒根据其基因相似性被组织成亚型分类法。对于人类免疫缺陷病毒1型(HIV-1),亚型分类在感染管理中起着至关重要的作用。基于序列比对的亚型分类方法对于大型数据集不实用,因为它们成本高且耗时。无比对方法涉及为基因序列创建数值表示并应用统计或机器学习方法。尽管现有模型总体准确率较高,但在不太常见的亚型上表现不佳。此外,研究序列向量化方法,特别是受自然语言启发的嵌入方法对HIV-1亚型分类的影响的工作有限。

结果

我们对跨机器学习方法的序列向量化方法进行了全面分析。我们报告了一种基于k-mer的XGBoost模型,其平衡准确率为0.84,表明它对常见和不常见的HIV-1亚型都具有良好的总体性能。我们还报告了一种基于Word2Vec的支持向量机,它在精度和平衡准确率方面取得了有希望的结果。我们的研究揭示了序列向量化方法对HIV-1亚型分类的影响,并表明受自然语言启发的编码方法具有潜力。我们的结果有助于开发改进的HIV-1亚型分类方法,从而改善个体患者的治疗效果,并推动亚型特异性治疗的发展。

可用性和实现

源代码可在https://www.github.com/kwade4/HIV_Subtypes获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb03/11371153/97a3c0c645b4/vbae108f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb03/11371153/a7c8a39f0382/vbae108f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb03/11371153/5a3b75fb4fc7/vbae108f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb03/11371153/97a3c0c645b4/vbae108f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb03/11371153/a7c8a39f0382/vbae108f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb03/11371153/5a3b75fb4fc7/vbae108f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb03/11371153/97a3c0c645b4/vbae108f3.jpg

相似文献

1
Investigating alignment-free machine learning methods for HIV-1 subtype classification.研究用于HIV-1亚型分类的无比对机器学习方法。
Bioinform Adv. 2024 Jul 29;4(1):vbae108. doi: 10.1093/bioadv/vbae108. eCollection 2024.
2
HIV-1 M group subtype classification using deep learning approach.利用深度学习方法对 HIV-1 M 组亚型进行分类。
Comput Biol Med. 2024 Dec;183:109218. doi: 10.1016/j.compbiomed.2024.109218. Epub 2024 Oct 5.
3
An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.一种基于开源 k-mer 的机器学习工具,用于快速准确地对 HIV-1 基因组进行分型。
PLoS One. 2018 Nov 14;13(11):e0206409. doi: 10.1371/journal.pone.0206409. eCollection 2018.
4
Patient Embeddings From Diagnosis Codes for Health Care Prediction Tasks: Pat2Vec Machine Learning Framework.用于医疗保健预测任务的诊断代码患者嵌入:Pat2Vec机器学习框架
JMIR AI. 2023 Apr 21;2:e40755. doi: 10.2196/40755.
5
16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.16S rRNA 序列嵌入:核苷酸序列有意义的数值特征表示形式,方便下游分析。
PLoS Comput Biol. 2019 Feb 26;15(2):e1006721. doi: 10.1371/journal.pcbi.1006721. eCollection 2019 Feb.
6
ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.ML-DSP:利用数字信号处理进行机器学习,实现了在所有分类学水平上的超快、准确和可扩展的基因组分类。
BMC Genomics. 2019 Apr 3;20(1):267. doi: 10.1186/s12864-019-5571-y.
7
A context-free encoding scheme of protein sequences for predicting antigenicity of diverse influenza A viruses.一种用于预测不同流感 A 病毒抗原性的蛋白质序列无上下文编码方案。
BMC Genomics. 2018 Dec 31;19(Suppl 10):936. doi: 10.1186/s12864-018-5282-9.
8
A genetic analysis of HIV-1 from Punjab, India reveals the presence of multiple variants.对来自印度旁遮普邦的HIV-1进行的基因分析揭示了多种变体的存在。
AIDS. 1995 Jul;9(7):685-90. doi: 10.1097/00002030-199507000-00003.
9
A novel alignment-free method for HIV-1 subtype classification.一种用于 HIV-1 亚型分类的新型无比对方法。
Infect Genet Evol. 2020 Jan;77:104080. doi: 10.1016/j.meegid.2019.104080. Epub 2019 Nov 1.
10
CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning.通过深度学习的病毒宏基因组数据的层次分类学分类。
Methods. 2021 May;189:95-103. doi: 10.1016/j.ymeth.2020.05.018. Epub 2020 May 23.

引用本文的文献

1
Environmental adaptations in metagenomes revealed by deep learning.深度学习揭示的宏基因组中的环境适应性
BMC Biol. 2025 Aug 11;23(1):252. doi: 10.1186/s12915-025-02361-1.
2
Genome language modeling (GLM): a beginner's cheat sheet.基因组语言建模(GLM):初学者简易指南。
Biol Methods Protoc. 2025 Mar 25;10(1):bpaf022. doi: 10.1093/biomethods/bpaf022. eCollection 2025.
3
Craft: A Machine Learning Approach to Dengue Subtyping.《Craft:一种登革热亚型分类的机器学习方法》

本文引用的文献

1
Geographic and Population Distributions of Human Immunodeficiency Virus (HIV)-1 and HIV-2 Circulating Subtypes: A Systematic Literature Review and Meta-analysis (2010-2021).地理和人口分布的人类免疫缺陷病毒(HIV)-1 和 HIV-2 流行亚型:系统文献回顾和荟萃分析(2010-2021)。
J Infect Dis. 2023 Nov 28;228(11):1583-1591. doi: 10.1093/infdis/jiad327.
2
HIV and Drug-Resistant Subtypes.人类免疫缺陷病毒与耐药亚型
Microorganisms. 2023 Jan 15;11(1):221. doi: 10.3390/microorganisms11010221.
3
An efficient numerical representation of genome sequence: natural vector with covariance component.
bioRxiv. 2025 Feb 13:2025.02.10.637410. doi: 10.1101/2025.02.10.637410.
基因组序列的高效数值表示:具有协方差分量的自然向量。
PeerJ. 2022 Jun 16;10:e13544. doi: 10.7717/peerj.13544. eCollection 2022.
4
Molecular epidemiology and HIV-1 variant evolution in Poland between 2015 and 2019.2015 年至 2019 年间波兰的分子流行病学与 HIV-1 变异进化。
Sci Rep. 2021 Aug 16;11(1):16609. doi: 10.1038/s41598-021-96125-w.
5
Global and Regional Estimates for Subtype-Specific Therapeutic and Prophylactic HIV-1 Vaccines: A Modeling Study.全球和区域针对特定亚型的治疗性和预防性HIV-1疫苗的估计:一项建模研究。
Front Microbiol. 2021 Jul 15;12:690647. doi: 10.3389/fmicb.2021.690647. eCollection 2021.
6
Human immunodeficiency virus (HIV) type 1 genetic diversity in HIV positive individuals on antiretroviral therapy in a cross-sectional study conducted in Teso, Western Kenya.在肯尼亚西部特索进行的一项横断面研究中,在接受抗逆转录病毒治疗的艾滋病毒阳性个体中,人类免疫缺陷病毒 1 型(HIV-1)的遗传多样性。
Pan Afr Med J. 2021 Apr 7;38:335. doi: 10.11604/pamj.2021.38.335.26357. eCollection 2021.
7
Phylogenetic Analysis of HIV-1 Genomes Based on the Position-Weighted K-mers Method.基于位置加权k-mer方法的HIV-1基因组系统发育分析
Entropy (Basel). 2020 Feb 23;22(2):255. doi: 10.3390/e22020255.
8
Genetic source completeness of HIV-1 circulating recombinant forms (CRFs) predicted by multi-label learning.多标签学习预测 HIV-1 循环重组形式(CRFs)的遗传来源完整性。
Bioinformatics. 2021 May 5;37(6):750-758. doi: 10.1093/bioinformatics/btaa887.
9
A novel alignment-free method for HIV-1 subtype classification.一种用于 HIV-1 亚型分类的新型无比对方法。
Infect Genet Evol. 2020 Jan;77:104080. doi: 10.1016/j.meegid.2019.104080. Epub 2019 Nov 1.
10
HIV-1 tropism prediction by the XGboost and HMM methods.使用 XGBoost 和 HMM 方法预测 HIV-1 嗜性。
Sci Rep. 2019 Jul 10;9(1):9997. doi: 10.1038/s41598-019-46420-4.