Syotti：用于 DNA 富集的可扩展诱饵设计。

Syotti: scalable bait design for DNA enrichment.

机构信息

Department of Computer Science, University of Helsinki, Helsinki, Finland.

Faculty of Computer Science, Dalhousie University, Halifax, Canada.

出版信息

Bioinformatics. 2022 Jun 24;38(Suppl 1):i177-i184. doi: 10.1093/bioinformatics/btac226.

DOI:10.1093/bioinformatics/btac226

PMID:35758776

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9235489/

Abstract

MOTIVATION

Bait enrichment is a protocol that is becoming increasingly ubiquitous as it has been shown to successfully amplify regions of interest in metagenomic samples. In this method, a set of synthetic probes ('baits') are designed, manufactured and applied to fragmented metagenomic DNA. The probes bind to the fragmented DNA and any unbound DNA is rinsed away, leaving the bound fragments to be amplified for sequencing. Metsky et al. demonstrated that bait-enrichment is capable of detecting a large number of human viral pathogens within metagenomic samples.

RESULTS

We formalize the problem of designing baits by defining the Minimum Bait Cover problem, show that the problem is NP-hard even under very restrictive assumptions, and design an efficient heuristic that takes advantage of succinct data structures. We refer to our method as Syotti. The running time of Syotti shows linear scaling in practice, running at least an order of magnitude faster than state-of-the-art methods, including the method of Metsky et al. At the same time, our method produces bait sets that are smaller than the ones produced by the competing methods, while also leaving fewer positions uncovered. Lastly, we show that Syotti requires only 25 min to design baits for a dataset comprised of 3 billion nucleotides from 1000 related bacterial substrains, whereas the method of Metsky et al. shows clearly super-linear running time and fails to process even a subset of 17% of the data in 72 h.

AVAILABILITY AND IMPLEMENTATION

https://github.com/jnalanko/syotti.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

诱饵富集是一种越来越普遍的方法，因为它已被证明可以成功地扩增宏基因组样本中的目标区域。在这种方法中，设计、制造并应用一组合成探针（“诱饵”）到碎片化的宏基因组 DNA。探针与碎片化的 DNA 结合，未结合的 DNA 被冲洗掉，留下结合的片段进行测序扩增。Metsky 等人证明，诱饵富集能够在宏基因组样本中检测到大量的人类病毒病原体。

结果

我们通过定义最小诱饵覆盖问题来形式化诱饵设计问题，表明即使在非常严格的假设下，该问题也是 NP 难的，并设计了一种利用简洁数据结构的有效启发式算法。我们将我们的方法称为 Syotti。Syotti 的运行时间在实践中呈线性缩放，比包括 Metsky 等人的方法在内的最先进方法至少快一个数量级。同时，我们的方法生成的诱饵集比竞争方法生成的诱饵集小，同时留下的未覆盖位置也更少。最后，我们表明，Syotti 仅需 25 分钟即可为一个由 1000 个相关细菌亚种的 30 亿个核苷酸组成的数据集设计诱饵，而 Metsky 等人的方法显示出明显的超线性运行时间，并且在 72 小时内无法处理甚至是数据的 17%的子集。

可用性和实现

https://github.com/jnalanko/syotti。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/120c/9235489/39a3225ffc70/btac226f1.jpg

相似文献

Syotti: scalable bait design for DNA enrichment.Syotti：用于 DNA 富集的可扩展诱饵设计。

Bioinformatics. 2022 Jun 24;38(Suppl 1):i177-i184. doi: 10.1093/bioinformatics/btac226.

BaitFisher: A Software Package for Multispecies Target DNA Enrichment Probe Design.BaitFisher：用于多物种目标 DNA 富集探针设计的软件包。

Mol Biol Evol. 2016 Jul;33(7):1875-86. doi: 10.1093/molbev/msw056. Epub 2016 Mar 23.

COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.可口可乐：利用序列组成、读段覆盖度、共比对和双端读段连接对宏基因组重叠群进行分箱。

Bioinformatics. 2017 Mar 15;33(6):791-798. doi: 10.1093/bioinformatics/btw290.

Large scale microbiome profiling in the cloud.大规模微生物组在云端的分析。

Bioinformatics. 2019 Jul 15;35(14):i13-i22. doi: 10.1093/bioinformatics/btz356.

DACE: a scalable DP-means algorithm for clustering extremely large sequence data.DACE：一种用于对超大型序列数据进行聚类的可扩展DP均值算法。

Bioinformatics. 2017 Mar 15;33(6):834-842. doi: 10.1093/bioinformatics/btw722.

Metaviral SPAdes: assembly of viruses from metagenomic data.Metaviral SPAdes：从宏基因组数据中组装病毒。

Bioinformatics. 2020 Aug 15;36(14):4126-4129. doi: 10.1093/bioinformatics/btaa490.

A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures.一种新的数据结构，用于支持基于 k-mer 特征的宏基因组序列的超快速分类学分类。

Bioinformatics. 2018 Jan 1;34(1):171-178. doi: 10.1093/bioinformatics/btx432.

Building large updatable colored de Bruijn graphs via merging.通过合并构建大型可更新彩色 de Bruijn 图。

Bioinformatics. 2019 Jul 15;35(14):i51-i60. doi: 10.1093/bioinformatics/btz350.

CoCoNet: an efficient deep learning tool for viral metagenome binning.CoCoNet：一种用于病毒宏基因组分箱的高效深度学习工具。

Bioinformatics. 2021 Sep 29;37(18):2803-2810. doi: 10.1093/bioinformatics/btab213.

KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping.KMCP：通过伪映射对原核生物和病毒种群进行准确的宏基因组分析。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac845.

引用本文的文献

OLTA: Optimizing bait seLection for TArgeted sequencing.OLTA：优化靶向测序的诱饵选择

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf146.

Factors impacting target-enriched long-read sequencing of resistomes and mobilomes.影响耐药组学和移动组学靶标富集长读测序的因素。

Genome Res. 2024 Nov 20;34(11):2048-2060. doi: 10.1101/gr.279226.124.

The TELCoMB Protocol for High-Sensitivity Detection of ARG-MGE Colocalizations in Complex Microbial Communities.TELCoMB 协议用于在复杂微生物群落中高灵敏度检测 ARG-MGE 共定位。

Curr Protoc. 2024 Oct;4(10):e70031. doi: 10.1002/cpz1.70031.

Considerations and Opportunities for Probe Capture Enrichment Sequencing of Emerging Viruses from Wastewater.考虑从废水中捕获新兴病毒的探针富集测序的注意事项和机会。

Environ Sci Technol. 2024 May 14;58(19):8161-8168. doi: 10.1021/acs.est.4c02638. Epub 2024 May 1.

Hybrid-Capture Target Enrichment in Human Pathogens: Identification, Evolution, Biosurveillance, and Genomic Epidemiology.人类病原体中的杂交捕获目标富集：鉴定、进化、生物监测和基因组流行病学

Pathogens. 2024 Mar 23;13(4):275. doi: 10.3390/pathogens13040275.

本文引用的文献

Detection of Antimicrobial Resistance Genes in the Milk Production Environment: Impact of Host DNA and Sequencing Depth.奶牛生产环境中抗微生物药物耐药基因的检测：宿主DNA和测序深度的影响

Front Microbiol. 2020 Aug 26;11:1983. doi: 10.3389/fmicb.2020.01983. eCollection 2020.

AnthOligo: automating the design of oligonucleotides for capture/enrichment technologies.AnthOligo：用于捕获/富集技术的寡核苷酸设计自动化。

Bioinformatics. 2020 Aug 1;36(15):4353-4356. doi: 10.1093/bioinformatics/btaa552.

Metagenomic sequencing with spiked primer enrichment for viral diagnostics and genomic surveillance.基于加标引物富集的宏基因组测序在病毒诊断和基因组监测中的应用。

Nat Microbiol. 2020 Mar;5(3):443-454. doi: 10.1038/s41564-019-0637-9. Epub 2020 Jan 13.

Capturing the Resistome: a Targeted Capture Method To Reveal Antibiotic Resistance Determinants in Metagenomes.捕获耐药组：一种靶向捕获方法，用于揭示宏基因组中的抗生素耐药决定因子。

Antimicrob Agents Chemother. 2019 Dec 20;64(1). doi: 10.1128/AAC.01324-19.

Capturing sequence diversity in metagenomes with comprehensive and scalable probe design.利用全面且可扩展的探针设计捕获宏基因组中的序列多样性。

Nat Biotechnol. 2019 Feb;37(2):160-168. doi: 10.1038/s41587-018-0006-x. Epub 2019 Feb 4.

MrBait: universal identification and design of targeted-enrichment capture probes.MrBait：靶向富集捕获探针的通用识别与设计。

Bioinformatics. 2018 Dec 15;34(24):4293-4296. doi: 10.1093/bioinformatics/bty548.

Enrichment allows identification of diverse, rare elements in metagenomic resistome-virulome sequencing.丰度分析可鉴定宏基因组耐药-毒力组测序中的多样化稀有元素。

Microbiome. 2017 Oct 17;5(1):142. doi: 10.1186/s40168-017-0361-8.

BaitsTools: Software for hybridization capture bait design.BaitsTools：杂交捕获探针设计软件。

Mol Ecol Resour. 2018 Mar;18(2):356-361. doi: 10.1111/1755-0998.12721. Epub 2017 Oct 9.

Targeted Enrichment for Pathogen Detection and Characterization in Three Felid Species.用于三种猫科动物病原体检测与特征分析的靶向富集

J Clin Microbiol. 2017 Jun;55(6):1658-1670. doi: 10.1128/JCM.01463-16. Epub 2017 Mar 22.

VSEARCH: a versatile open source tool for metagenomics.VSEARCH：一款用于宏基因组学的多功能开源工具。

PeerJ. 2016 Oct 18;4:e2584. doi: 10.7717/peerj.2584. eCollection 2016.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Syotti：用于 DNA 富集的可扩展诱饵设计。

Syotti: scalable bait design for DNA enrichment.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献