• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

推进微生物诊断学:一种通用的系统发育指导计算算法,用于寻找用于精确微生物检测的独特序列。

Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection.

机构信息

Malaviya National Institute of Technology, Jawahar Lal Nehru Marg, Jhalana Gram, Malviya Nagar, Jaipur, Rajasthan 302017, India.

Centre for Converging Technologies, University of Rajasthan, Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004, India.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae545.

DOI:10.1093/bib/bbae545
PMID:39441245
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11497845/
Abstract

Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer-based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.

摘要

源自具有共同进化起源的生物体的序列表现出相似性,而在相关生物体中不存在的独特序列则可以作为良好的诊断标记候选物。然而,这种专注于识别密切相关生物体之间不同区域的方法存在挑战,因为它需要复杂的多序列比对,使得计算和解析变得困难。为了解决这个问题,我们开发了一种受生物学启发的通用 NAUniSeq 算法,通过遍历生命的系统发育来寻找用于微生物诊断的独特序列。通过系统发育树进行映射可确保交叉污染和假阳性的数量较低。我们从 Taxadb 数据库下载了完整的分类学数据,并从国家生物技术信息中心参考序列数据库(NCBI-Refseq)下载了序列数据,借助 NetworkX 创建了一个系统发育树。序列被分配到图节点上,为目标和非目标节点创建了 k-mer,并使用深度优先搜索算法在图上进行搜索。在一种内存高效的替代 NoSQL 方法中,我们在 MongoDB 数据库中使用 tax-id 和 FASTA 文件的路径创建了 Refseq 序列集合。我们针对目标和非目标序列查询 MongoDB 集合。在这两种方法中,我们都使用了一种无比对的滑动窗口 k-mer 方法,该方法可以快速比较目标和非目标序列的 k-mer,并返回不在非目标序列中的独特序列。我们使用目标节点结核分枝杆菌、淋病奈瑟菌和猴痘验证了我们的算法,并生成了独特序列。这种通用算法是生成诊断序列的强大工具,可实现具有高系统发育精度的微生物菌株的准确识别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/d23f86a5009e/bbae545f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/178e537e2f70/bbae545f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/d2b59d1a87d7/bbae545f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/f7901e071077/bbae545f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/8136e26fa39e/bbae545f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/e235a967b57f/bbae545f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/fd5f7f4c3af6/bbae545f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/28f6b2a5a1f9/bbae545f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/d23f86a5009e/bbae545f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/178e537e2f70/bbae545f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/d2b59d1a87d7/bbae545f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/f7901e071077/bbae545f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/8136e26fa39e/bbae545f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/e235a967b57f/bbae545f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/fd5f7f4c3af6/bbae545f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/28f6b2a5a1f9/bbae545f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b308/11497845/d23f86a5009e/bbae545f8.jpg

相似文献

1
Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection.推进微生物诊断学:一种通用的系统发育指导计算算法,用于寻找用于精确微生物检测的独特序列。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae545.
2
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
3
Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm.使用一种基于系统发育感知图算法的多重序列比对精确扩展方法。
Bioinformatics. 2012 Jul 1;28(13):1684-91. doi: 10.1093/bioinformatics/bts198. Epub 2012 Apr 23.
4
The All-Species Living Tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains.全物种生命树项目:基于16S rRNA的所有已测序模式菌株的系统发育树。
Syst Appl Microbiol. 2008 Sep;31(4):241-50. doi: 10.1016/j.syapm.2008.07.001. Epub 2008 Aug 9.
5
A new graph-theoretic approach to determine the similarity of genome sequences based on nucleotide triplets.一种新的基于三核苷酸的图论方法来确定基因组序列的相似性。
Genomics. 2020 Nov;112(6):4701-4714. doi: 10.1016/j.ygeno.2020.08.023. Epub 2020 Aug 19.
6
TreeWave: command line tool for alignment-free phylogeny reconstruction based on graphical representation of DNA sequences and genomic signal processing.TreeWave:基于 DNA 序列图形表示和基因组信号处理的无比对系统发育重建命令行工具。
BMC Bioinformatics. 2024 Nov 27;25(1):367. doi: 10.1186/s12859-024-05992-3.
7
PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PhyloGibbs:一种整合了系统发育的吉布斯采样基序查找器。
PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9.
8
kmer2vec: A Novel Method for Comparing DNA Sequences by word2vec Embedding.kmer2vec:一种基于 word2vec 嵌入的 DNA 序列比较新方法。
J Comput Biol. 2022 Sep;29(9):1001-1021. doi: 10.1089/cmb.2021.0536. Epub 2022 May 20.
9
RibAlign: a software tool and database for eubacterial phylogeny based on concatenated ribosomal protein subunits.RibAlign:一种基于串联核糖体蛋白亚基的真细菌系统发育分析的软件工具和数据库。
BMC Bioinformatics. 2006 Feb 13;7:66. doi: 10.1186/1471-2105-7-66.
10
CSA: an efficient algorithm to improve circular DNA multiple alignment.CSA:一种改进环状DNA多重比对的高效算法。
BMC Bioinformatics. 2009 Jul 23;10:230. doi: 10.1186/1471-2105-10-230.

引用本文的文献

1
Systems pharmacology identifies ajugol-mediated NF-κB/caspase-3 inhibition and isoacteoside-driven p62/mTOR-mediated autophagy as key mechanisms of Rehmanniae Radix and its processed form in Alzheimer's treatment.系统药理学确定了紫丁香苷介导的NF-κB/半胱天冬酶-3抑制以及异毛蕊花糖苷驱动的p62/雷帕霉素靶蛋白介导的自噬,作为生地黄及其炮制品在治疗阿尔茨海默病中的关键机制。
Front Pharmacol. 2025 Aug 29;16:1644847. doi: 10.3389/fphar.2025.1644847. eCollection 2025.
2
Fast detection of unique genomic regions.独特基因组区域的快速检测。
Comput Struct Biotechnol J. 2025 Feb 27;27:843-850. doi: 10.1016/j.csbj.2025.02.025. eCollection 2025.

本文引用的文献

1
BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis.BLEND:一种在基因组分析中快速、节省内存且准确地查找模糊种子匹配项的机制。
NAR Genom Bioinform. 2023 Jan 20;5(1):lqad004. doi: 10.1093/nargab/lqad004. eCollection 2023 Mar.
2
KEC: unique sequence search by K-mer exclusion.KEC:通过k-mer排除进行独特序列搜索。
Bioinformatics. 2021 Oct 11;37(19):3349-3350. doi: 10.1093/bioinformatics/btab196.
3
Fur: Find unique genomic regions for diagnostic PCR.Fur:找到用于诊断性聚合酶链反应的独特基因组区域。
Bioinformatics. 2021 Aug 9;37(15):2081-2087. doi: 10.1093/bioinformatics/btab059.
4
Amino Acid -mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights.用于通过机器学习进行定量抗菌药物耐药性(AMR)预测及生物洞察的模型解释的氨基酸-mer特征提取
Biology (Basel). 2020 Oct 28;9(11):365. doi: 10.3390/biology9110365.
5
GenMap: ultra-fast computation of genome mappability.GenMap:快速计算基因组可映射性。
Bioinformatics. 2020 Jun 1;36(12):3687-3692. doi: 10.1093/bioinformatics/btaa222.
6
Unique -mers as Strain-Specific Barcodes for Phylogenetic Analysis and Natural Microbiome Profiling.独特的单核苷酸多态性作为系统发育分析和自然微生物组分析的菌株特异性条码。
Int J Mol Sci. 2020 Jan 31;21(3):944. doi: 10.3390/ijms21030944.
7
Comparison of Open-Source Reverse Vaccinology Programs for Bacterial Vaccine Antigen Discovery.开源反向疫苗学程序在细菌疫苗抗原发现中的比较。
Front Immunol. 2019 Feb 14;10:113. doi: 10.3389/fimmu.2019.00113. eCollection 2019.
8
SignalP 5.0 improves signal peptide predictions using deep neural networks.SignalP 5.0 使用深度神经网络改进了信号肽预测。
Nat Biotechnol. 2019 Apr;37(4):420-423. doi: 10.1038/s41587-019-0036-z. Epub 2019 Feb 18.
9
Recapitulating phylogenies using -mers: from trees to networks.使用k-mer重现系统发育:从树到网络。
F1000Res. 2016 Nov 29;5:2789. doi: 10.12688/f1000research.10225.2. eCollection 2016.
10
Bitpacking techniques for indexing genomes: I. Hash tables.用于基因组索引的位包装技术:I. 哈希表
Algorithms Mol Biol. 2016 Apr 18;11:5. doi: 10.1186/s13015-016-0069-5. eCollection 2016.