• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

宏基因组数据库有损压缩和主题对齐的 MetaGens 算法。

The MetaGens algorithm for metagenomic database lossy compression and subject alignment.

机构信息

Graduate Program in Health Sciences, Universidade Federal de Ciências da Saúde de Porto Alegre (UFCSPA), Rua Sarmento Leite, 245 - Centro Histórico, Porto Alegre, RS 90050-170, Brazil.

出版信息

Database (Oxford). 2023 Aug 11;2023. doi: 10.1093/database/baad053.

DOI:10.1093/database/baad053
PMID:37566631
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10419334/
Abstract

The advancement of genetic sequencing techniques led to the production of a large volume of data. The extraction of genetic material from a sample is one of the early steps of the metagenomic study. With the evolution of the processes, the analysis of the sequenced data allowed the discovery of etiological agents and, by corollary, the diagnosis of infections. One of the biggest challenges of the technique is the huge volume of data generated with each new technology developed. To introduce an algorithm that may reduce the data volume, allowing faster DNA matching with the reference databases. Using techniques like lossy compression and substitution matrix, it is possible to match nucleotide sequences without losing the subject. This lossy compression explores the nature of DNA mutations, insertions and deletions and the possibility that different sequences are the same subject. The algorithm can reduce the overall size of the database to 15% of the original size. Depending on parameters, it may reduce up to 5% of the original size. Although is the same as the other platforms, the match algorithm is more sensible because it ignores the transitions and transversions, resulting in a faster way to obtain the diagnostic results. The first experiment results in an increase in speed 10 times faster than Blast while maintaining high sensitivity. This performance gain can be extended by combining other techniques already used in other studies, such as hash tables. Database URL https://github.com/ghc4/metagens.

摘要

遗传测序技术的进步导致了大量数据的产生。从样本中提取遗传物质是宏基因组研究的早期步骤之一。随着这些过程的发展,对测序数据的分析使得能够发现病原体,并因此诊断感染。该技术面临的最大挑战之一是,每种新技术都会产生大量的数据。为了引入一种可能减少数据量的算法,从而实现更快地将 DNA 与参考数据库进行匹配。通过使用有损压缩和替代矩阵等技术,可以在不丢失主题的情况下匹配核苷酸序列。这种有损压缩探索了 DNA 突变、插入和缺失的性质,以及不同序列可能是同一主题的可能性。该算法可以将数据库的总体大小减少到原始大小的 15%。根据参数的不同,它可以将原始大小减少多达 5%。虽然与其他平台相同,但匹配算法更合理,因为它忽略了转换和颠换,从而可以更快地获得诊断结果。第一个实验的速度比 Blast 快 10 倍,同时保持了高灵敏度。通过结合其他已经在其他研究中使用的技术,例如哈希表,可以扩展这种性能提升。数据库 URL:https://github.com/ghc4/metagens。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/669e80201dab/baad053f14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/dc25785e0ecb/baad053f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/a8f0325c384f/baad053f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/279599274020/baad053f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/a270363cfff7/baad053f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/27ca06befda8/baad053f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/7483961699b0/baad053f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/2debc1570ee3/baad053f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/1228f6f74316/baad053f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/6c50cb2a56f0/baad053f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/13bad3dd3a1e/baad053f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/e3daf0f4d2d3/baad053f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/da4d003f958f/baad053f12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/2e90dcee4dd9/baad053f13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/669e80201dab/baad053f14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/dc25785e0ecb/baad053f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/a8f0325c384f/baad053f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/279599274020/baad053f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/a270363cfff7/baad053f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/27ca06befda8/baad053f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/7483961699b0/baad053f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/2debc1570ee3/baad053f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/1228f6f74316/baad053f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/6c50cb2a56f0/baad053f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/13bad3dd3a1e/baad053f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/e3daf0f4d2d3/baad053f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/da4d003f958f/baad053f12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/2e90dcee4dd9/baad053f13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ca8/10419334/669e80201dab/baad053f14.jpg

相似文献

1
The MetaGens algorithm for metagenomic database lossy compression and subject alignment.宏基因组数据库有损压缩和主题对齐的 MetaGens 算法。
Database (Oxford). 2023 Aug 11;2023. doi: 10.1093/database/baad053.
2
FaStore: a space-saving solution for raw sequencing data.FaStore:一种节省存储空间的原始测序数据解决方案。
Bioinformatics. 2018 Aug 15;34(16):2748-2756. doi: 10.1093/bioinformatics/bty205.
3
High-speed and high-ratio referential genome compression.高速高比参照基因组压缩。
Bioinformatics. 2017 Nov 1;33(21):3364-3372. doi: 10.1093/bioinformatics/btx412.
4
Lossy Compression of Quality Values in Sequencing Data.测序数据中质量值的有损压缩。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1958-1969. doi: 10.1109/TCBB.2019.2959273. Epub 2021 Oct 7.
5
Performance evaluation of lossy quality compression algorithms for RNA-seq data.RNA-seq 数据有损质量压缩算法的性能评估。
BMC Bioinformatics. 2020 Jul 20;21(1):321. doi: 10.1186/s12859-020-03658-4.
6
Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis.基于哈希的重排序压缩基因组测序reads:算法与分析。
Bioinformatics. 2018 Feb 15;34(4):558-567. doi: 10.1093/bioinformatics/btx639.
7
QualComp: a new lossy compressor for quality scores based on rate distortion theory.QualComp:一种基于率失真理论的新的基于质量分数的有损压缩器。
BMC Bioinformatics. 2013 Jun 8;14:187. doi: 10.1186/1471-2105-14-187.
8
A compression method for DNA.一种 DNA 的压缩方法。
PLoS One. 2020 Nov 25;15(11):e0238220. doi: 10.1371/journal.pone.0238220. eCollection 2020.
9
ChIPWig: a random access-enabling lossless and lossy compression method for ChIP-seq data.ChIPWig:一种用于 ChIP-seq 数据的随机访问支持的无损和有损压缩方法。
Bioinformatics. 2018 Mar 15;34(6):911-919. doi: 10.1093/bioinformatics/btx685.
10
mspack: efficient lossless and lossy mass spectrometry data compression.mspack:高效的无损和有损质谱数据压缩。
Bioinformatics. 2021 Nov 5;37(21):3923-3925. doi: 10.1093/bioinformatics/btab636.

本文引用的文献

1
Clinical metagenomics.临床宏基因组学。
Nat Rev Genet. 2019 Jun;20(6):341-355. doi: 10.1038/s41576-019-0113-7.
2
Metagenomics for Clinical Infectious Disease Diagnostics Steps Closer to Reality.宏基因组学在临床感染性疾病诊断中的应用又近了一步。
J Clin Microbiol. 2018 Aug 27;56(9). doi: 10.1128/JCM.00850-18. Print 2018 Sep.
3
The burden and epidemiology of community-acquired central nervous system infections: a multinational study.社区获得性中枢神经系统感染的负担和流行病学:一项多国研究。
Eur J Clin Microbiol Infect Dis. 2017 Sep;36(9):1595-1611. doi: 10.1007/s10096-017-2973-0. Epub 2017 Apr 10.
4
Diagnostic metagenomics: potential applications to bacterial, viral and parasitic infections.诊断宏基因组学:在细菌、病毒和寄生虫感染中的潜在应用
Parasitology. 2014 Dec;141(14):1856-62. doi: 10.1017/S0031182014000134. Epub 2014 Feb 27.
5
Technology-enhanced simulation for health professions education: a systematic review and meta-analysis.技术增强型模拟在卫生专业教育中的应用:系统评价和荟萃分析。
JAMA. 2011 Sep 7;306(9):978-88. doi: 10.1001/jama.2011.1234.
6
Size Does Matter: Application-driven Approaches for Soil Metagenomics.规模很重要:土壤宏基因组学的应用驱动方法
Soil Biol Biochem. 2010 Nov 1;42(11):1911-1923. doi: 10.1016/j.soilbio.2010.07.021.
7
Bioinformatics for whole-genome shotgun sequencing of microbial communities.用于微生物群落全基因组鸟枪法测序的生物信息学
PLoS Comput Biol. 2005 Jul;1(2):106-12. doi: 10.1371/journal.pcbi.0010024.
8
Viral meningitis.病毒性脑膜炎
Semin Neurol. 2000;20(3):277-92. doi: 10.1055/s-2000-9427.
9
Identification of common molecular subsequences.常见分子子序列的鉴定
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.
10
A general method applicable to the search for similarities in the amino acid sequence of two proteins.一种适用于寻找两种蛋白质氨基酸序列相似性的通用方法。
J Mol Biol. 1970 Mar;48(3):443-53. doi: 10.1016/0022-2836(70)90057-4.