Suppr超能文献

Syotti:用于 DNA 富集的可扩展诱饵设计。

Syotti: scalable bait design for DNA enrichment.

机构信息

Department of Computer Science, University of Helsinki, Helsinki, Finland.

Faculty of Computer Science, Dalhousie University, Halifax, Canada.

出版信息

Bioinformatics. 2022 Jun 24;38(Suppl 1):i177-i184. doi: 10.1093/bioinformatics/btac226.

Abstract

MOTIVATION

Bait enrichment is a protocol that is becoming increasingly ubiquitous as it has been shown to successfully amplify regions of interest in metagenomic samples. In this method, a set of synthetic probes ('baits') are designed, manufactured and applied to fragmented metagenomic DNA. The probes bind to the fragmented DNA and any unbound DNA is rinsed away, leaving the bound fragments to be amplified for sequencing. Metsky et al. demonstrated that bait-enrichment is capable of detecting a large number of human viral pathogens within metagenomic samples.

RESULTS

We formalize the problem of designing baits by defining the Minimum Bait Cover problem, show that the problem is NP-hard even under very restrictive assumptions, and design an efficient heuristic that takes advantage of succinct data structures. We refer to our method as Syotti. The running time of Syotti shows linear scaling in practice, running at least an order of magnitude faster than state-of-the-art methods, including the method of Metsky et al. At the same time, our method produces bait sets that are smaller than the ones produced by the competing methods, while also leaving fewer positions uncovered. Lastly, we show that Syotti requires only 25 min to design baits for a dataset comprised of 3 billion nucleotides from 1000 related bacterial substrains, whereas the method of Metsky et al. shows clearly super-linear running time and fails to process even a subset of 17% of the data in 72 h.

AVAILABILITY AND IMPLEMENTATION

https://github.com/jnalanko/syotti.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

诱饵富集是一种越来越普遍的方法,因为它已被证明可以成功地扩增宏基因组样本中的目标区域。在这种方法中,设计、制造并应用一组合成探针(“诱饵”)到碎片化的宏基因组 DNA。探针与碎片化的 DNA 结合,未结合的 DNA 被冲洗掉,留下结合的片段进行测序扩增。Metsky 等人证明,诱饵富集能够在宏基因组样本中检测到大量的人类病毒病原体。

结果

我们通过定义最小诱饵覆盖问题来形式化诱饵设计问题,表明即使在非常严格的假设下,该问题也是 NP 难的,并设计了一种利用简洁数据结构的有效启发式算法。我们将我们的方法称为 Syotti。Syotti 的运行时间在实践中呈线性缩放,比包括 Metsky 等人的方法在内的最先进方法至少快一个数量级。同时,我们的方法生成的诱饵集比竞争方法生成的诱饵集小,同时留下的未覆盖位置也更少。最后,我们表明,Syotti 仅需 25 分钟即可为一个由 1000 个相关细菌亚种的 30 亿个核苷酸组成的数据集设计诱饵,而 Metsky 等人的方法显示出明显的超线性运行时间,并且在 72 小时内无法处理甚至是数据的 17%的子集。

可用性和实现

https://github.com/jnalanko/syotti。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/120c/9235489/39a3225ffc70/btac226f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验