Suppr超能文献

推进微生物诊断学:一种通用的系统发育指导计算算法,用于寻找用于精确微生物检测的独特序列。

Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection.

机构信息

Malaviya National Institute of Technology, Jawahar Lal Nehru Marg, Jhalana Gram, Malviya Nagar, Jaipur, Rajasthan 302017, India.

Centre for Converging Technologies, University of Rajasthan, Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004, India.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae545.

Abstract

Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer-based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.

摘要

源自具有共同进化起源的生物体的序列表现出相似性,而在相关生物体中不存在的独特序列则可以作为良好的诊断标记候选物。然而,这种专注于识别密切相关生物体之间不同区域的方法存在挑战,因为它需要复杂的多序列比对,使得计算和解析变得困难。为了解决这个问题,我们开发了一种受生物学启发的通用 NAUniSeq 算法,通过遍历生命的系统发育来寻找用于微生物诊断的独特序列。通过系统发育树进行映射可确保交叉污染和假阳性的数量较低。我们从 Taxadb 数据库下载了完整的分类学数据,并从国家生物技术信息中心参考序列数据库(NCBI-Refseq)下载了序列数据,借助 NetworkX 创建了一个系统发育树。序列被分配到图节点上,为目标和非目标节点创建了 k-mer,并使用深度优先搜索算法在图上进行搜索。在一种内存高效的替代 NoSQL 方法中,我们在 MongoDB 数据库中使用 tax-id 和 FASTA 文件的路径创建了 Refseq 序列集合。我们针对目标和非目标序列查询 MongoDB 集合。在这两种方法中,我们都使用了一种无比对的滑动窗口 k-mer 方法,该方法可以快速比较目标和非目标序列的 k-mer,并返回不在非目标序列中的独特序列。我们使用目标节点结核分枝杆菌、淋病奈瑟菌和猴痘验证了我们的算法,并生成了独特序列。这种通用算法是生成诊断序列的强大工具,可实现具有高系统发育精度的微生物菌株的准确识别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/178e537e2f70/bbae545f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验