• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Catwalk:在大型微生物序列数据库中识别密切相关的序列。

Catwalk: identifying closely related sequences in large microbial sequence databases.

机构信息

Nuffield Department of Medicine, University of Oxford, Oxford, UK.

Present address: UKRI Science and Technologies Facilities Council, Harwell, UK.

出版信息

Microb Genom. 2022 Jun;8(6). doi: 10.1099/mgen.0.000850.

DOI:10.1099/mgen.0.000850
PMID:35771206
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9455716/
Abstract

There is a need to identify microbial sequences that may form part of transmission chains, or that may represent importations across national boundaries, amidst large numbers of SARS-CoV-2 and other bacterial or viral sequences. Reference-based compression is a sequence analysis technique that allows both a compact storage of sequence data and comparisons between sequences. Published implementations of the approach are being challenged by the large sample collections now being generated. Our aim was to develop a fast software detecting highly similar sequences in large collections of microbial genomes, including millions of SARS-CoV-2 genomes. To do so, we developed Catwalk, a tool that bypasses bottlenecks in the generation, comparison and in-memory storage of microbial genomes generated by reference mapping. It is a compiled solution, coded in Nim to increase performance. It can be accessed via command line, rest api or web server interfaces. We tested Catwalk using both SARS-CoV-2 and genomes generated by prospective public-health sequencing programmes. Pairwise sequence comparisons, using clinically relevant similarity cut-offs, took about 0.39 and 0.66 μs, respectively; in 1 s, between 1 and 2 million sequences can be searched. Catwalk operates about 1700 times faster than, and uses about 8 % of the RAM of, a Python reference-based compression and comparison tool in current use for outbreak detection. Catwalk can rapidly identify close relatives of a SARS-CoV-2 or genome amidst millions of samples.

摘要

需要识别可能构成传播链一部分的微生物序列,或可能代表跨越国界的输入的微生物序列,这些序列中包含大量的 SARS-CoV-2 和其他细菌或病毒序列。基于参考的压缩是一种序列分析技术,允许对序列数据进行紧凑存储,并在序列之间进行比较。该方法的已发表实现受到现在生成的大量样本集合的挑战。我们的目标是开发一种快速的软件,用于在包括数百万个 SARS-CoV-2 基因组在内的大量微生物基因组集合中检测高度相似的序列。为此,我们开发了 Catwalk,这是一种工具,可绕过基于参考的映射生成、比较和微生物基因组内存存储中的瓶颈。它是一个用 Nim 编写的编译解决方案,可提高性能。它可以通过命令行、rest api 或 web 服务器接口访问。我们使用 SARS-CoV-2 和 prospective 公共卫生测序计划生成的基因组测试了 Catwalk。使用临床相关的相似性截止值进行的成对序列比较分别耗时约 0.39 和 0.66 μs;在 1 秒内,可以搜索 1 到 200 万个序列。Catwalk 的运行速度比当前用于爆发检测的基于 Python 的参考压缩和比较工具快约 1700 倍,并且使用的 RAM 约为其 8%。Catwalk 可以在数百万个样本中快速识别 SARS-CoV-2 或 基因组的近亲。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc12/9455716/4a567c844192/mgen-8-850-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc12/9455716/4a567c844192/mgen-8-850-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc12/9455716/4a567c844192/mgen-8-850-g001.jpg

相似文献

1
Catwalk: identifying closely related sequences in large microbial sequence databases.Catwalk:在大型微生物序列数据库中识别密切相关的序列。
Microb Genom. 2022 Jun;8(6). doi: 10.1099/mgen.0.000850.
2
CoV-Seq, a New Tool for SARS-CoV-2 Genome Analysis and Visualization: Development and Usability Study.CoV-Seq,一种用于SARS-CoV-2基因组分析和可视化的新工具:开发与可用性研究
J Med Internet Res. 2020 Oct 2;22(10):e22299. doi: 10.2196/22299.
3
Comparative studies on the high-performance compression of SARS-CoV-2 genome collections.SARS-CoV-2 基因组集的高性能压缩比较研究。
Brief Funct Genomics. 2022 Apr 11;21(2):103-112. doi: 10.1093/bfgp/elab041.
4
BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness.BugMat和FindNeighbour:用于调查细菌亲缘关系的命令行和服务器应用程序。
BMC Bioinformatics. 2017 Nov 13;18(1):477. doi: 10.1186/s12859-017-1907-2.
5
Identification of Epidemiological Traits by Analysis of SARS-CoV-2 Sequences.通过分析 SARS-CoV-2 序列鉴定流行病学特征。
Viruses. 2021 Apr 27;13(5):764. doi: 10.3390/v13050764.
6
CovidPhy: A tool for phylogeographic analysis of SARS-CoV-2 variation.CovidPhy:用于 SARS-CoV-2 变异的系统地理学分析的工具。
Environ Res. 2022 Mar;204(Pt A):111909. doi: 10.1016/j.envres.2021.111909. Epub 2021 Aug 20.
7
Rapid automated validation, annotation and publication of SARS-CoV-2 sequences to GenBank.快速自动化验证、注释和向 GenBank 发布 SARS-CoV-2 序列。
Database (Oxford). 2022 Mar 1;2022. doi: 10.1093/database/baac006.
8
Effectiveness and cost-effectiveness of four different strategies for SARS-CoV-2 surveillance in the general population (CoV-Surv Study): a structured summary of a study protocol for a cluster-randomised, two-factorial controlled trial.在普通人群中进行 SARS-CoV-2 监测的四种不同策略的有效性和成本效益(CoV-Surv 研究):一项关于集群随机、双因素对照试验的研究方案的结构化总结。
Trials. 2021 Jan 8;22(1):39. doi: 10.1186/s13063-020-04982-z.
9
gofasta: command-line utilities for genomic epidemiology research.gofasta:用于基因组流行病学研究的命令行实用程序。
Bioinformatics. 2022 Aug 10;38(16):4033-4035. doi: 10.1093/bioinformatics/btac424.
10
Profiling SARS-CoV-2 mutation fingerprints that range from the viral pangenome to individual infection quasispecies.分析 SARS-CoV-2 突变指纹,范围从病毒泛基因组到个体感染准种。
Genome Med. 2021 Apr 19;13(1):62. doi: 10.1186/s13073-021-00882-2.

本文引用的文献

1
Phylogenetic estimates of SARS-CoV-2 introductions into Washington State.对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)传入华盛顿州的系统发育估计。
Lancet Reg Health Am. 2021 Sep;1:100018. doi: 10.1016/j.lana.2021.100018. Epub 2021 Jul 13.
2
Epidemiological data and genome sequencing reveals that nosocomial transmission of SARS-CoV-2 is underestimated and mostly mediated by a small number of highly infectious individuals.流行病学数据和基因组测序表明,医院内 SARS-CoV-2 的传播被低估了,主要是由少数具有高度传染性的个体传播的。
J Infect. 2021 Oct;83(4):473-482. doi: 10.1016/j.jinf.2021.07.034. Epub 2021 Jul 28.
3
Ongoing global and regional adaptive evolution of SARS-CoV-2.
SARS-CoV-2 在全球和区域范围内持续的适应性进化。
Proc Natl Acad Sci U S A. 2021 Jul 20;118(29). doi: 10.1073/pnas.2104241118. Epub 2021 Jul 2.
4
SARS-CoV-2 variants, spike mutations and immune escape.SARS-CoV-2 变体、刺突突变和免疫逃逸。
Nat Rev Microbiol. 2021 Jul;19(7):409-424. doi: 10.1038/s41579-021-00573-0. Epub 2021 Jun 1.
5
Genomic epidemiology reveals multiple introductions of SARS-CoV-2 from mainland Europe into Scotland.基因组流行病学揭示了 SARS-CoV-2 多次从欧洲大陆传入苏格兰。
Nat Microbiol. 2021 Jan;6(1):112-122. doi: 10.1038/s41564-020-00838-z. Epub 2020 Dec 21.
6
Insertion and deletion evolution reflects antibiotics selection pressure in a Mycobacterium tuberculosis outbreak.插入和缺失突变的进化反映了分枝杆菌结核爆发中的抗生素选择压力。
PLoS Pathog. 2020 Sep 30;16(9):e1008357. doi: 10.1371/journal.ppat.1008357. eCollection 2020 Sep.
7
Role and value of whole genome sequencing in studying tuberculosis transmission.全基因组测序在研究结核病传播中的作用和价值。
Clin Microbiol Infect. 2019 Nov;25(11):1377-1382. doi: 10.1016/j.cmi.2019.03.022. Epub 2019 Apr 11.
8
Prediction of Susceptibility to First-Line Tuberculosis Drugs by DNA Sequencing.基于 DNA 测序的一线抗结核药物敏感性预测。
N Engl J Med. 2018 Oct 11;379(15):1403-1415. doi: 10.1056/NEJMoa1800474. Epub 2018 Sep 26.
9
Harmonized Genome Wide Typing of Tubercle Bacilli Using a Web-Based Gene-By-Gene Nomenclature System.基于 Web 的基因命名系统进行结核分枝杆菌全基因组基因分型的研究
EBioMedicine. 2018 Aug;34:131-138. doi: 10.1016/j.ebiom.2018.07.030. Epub 2018 Aug 13.
10
BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness.BugMat和FindNeighbour:用于调查细菌亲缘关系的命令行和服务器应用程序。
BMC Bioinformatics. 2017 Nov 13;18(1):477. doi: 10.1186/s12859-017-1907-2.