Suppr超能文献

使用uvaia进行可扩展的邻域搜索和比对。

Scalable neighbour search and alignment with uvaia.

作者信息

de Oliveira Martins Leonardo, Mather Alison E, Page Andrew J

机构信息

Quadram Institute Bioscience, Norwich, United Kingdom.

University of East Anglia, Norwich, United Kingdom.

出版信息

PeerJ. 2024 Mar 6;12:e16890. doi: 10.7717/peerj.16890. eCollection 2024.

Abstract

Despite millions of SARS-CoV-2 genomes being sequenced and shared globally, manipulating such data sets is still challenging, especially selecting sequences for focused phylogenetic analysis. We present a novel method, uvaia, which is based on partial and exact sequence similarity for quickly extracting database sequences similar to query sequences of interest. Many SARS-CoV-2 phylogenetic analyses rely on very low numbers of ambiguous sites as a measure of quality since ambiguous sites do not contribute to single nucleotide polymorphism (SNP) differences. Uvaia overcomes this limitation by using measures of sequence similarity which consider partially ambiguous sites, allowing for more ambiguous sequences to be included in the analysis if needed. Such fine-grained definition of similarity allows not only for better phylogenetic analyses, but could also lead to improved classification and biogeographical inferences. Uvaia works natively with compressed files, can use multiple cores and efficiently utilises memory, being able to analyse large data sets on a standard desktop.

摘要

尽管全球已对数百万个严重急性呼吸综合征冠状病毒2(SARS-CoV-2)基因组进行了测序和共享,但处理此类数据集仍然具有挑战性,尤其是为重点系统发育分析选择序列。我们提出了一种新方法——uvaia,它基于部分和精确序列相似性,用于快速从数据库中提取与感兴趣的查询序列相似的序列。许多SARS-CoV-2系统发育分析依赖于非常少量的模糊位点作为质量衡量标准,因为模糊位点对单核苷酸多态性(SNP)差异没有贡献。uvaia通过使用考虑部分模糊位点的序列相似性度量克服了这一限制,从而在需要时允许将更多模糊序列纳入分析。这种对相似性的细粒度定义不仅有助于进行更好的系统发育分析,还可能改善分类和生物地理学推断。uvaia原生支持处理压缩文件,可以使用多个内核并高效利用内存,能够在标准桌面上分析大型数据集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验