• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

全基因组测序样本的探索与检索。

Exploration and retrieval of whole-metagenome sequencing samples.

作者信息

Seth Sohan, Välimäki Niko, Kaski Samuel, Honkela Antti

机构信息

Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Espoo, Finland, Genome-Scale Biology Program and Department of Medical Genetics, University of Helsinki, Helsinki, Finland, and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

出版信息

Bioinformatics. 2014 Sep 1;30(17):2471-9. doi: 10.1093/bioinformatics/btu340. Epub 2014 May 19.

DOI:10.1093/bioinformatics/btu340
PMID:24845653
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4230234/
Abstract

MOTIVATION

Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before. This technical advancement has initiated the trend of sequencing multiple samples in different conditions or environments to explore the similarities and dissimilarities of the microbial communities. Examples include the human microbiome project and various studies of the human intestinal tract. With the availability of ever larger databases of such measurements, finding samples similar to a given query sample is becoming a central operation.

RESULTS

In this article, we develop a content-based exploration and retrieval method for whole-metagenome sequencing samples. We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples. We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples. We observe significant enrichment for diseased gut samples in results of queries with another diseased sample and high accuracy in discriminating between different body sites even though the method is unsupervised.

AVAILABILITY AND IMPLEMENTATION

A software implementation of the DSM framework is available at https://github.com/HIITMetagenomics/dsm-framework.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

近年来,由于高通量测序技术的发展,全基因组鸟枪法测序领域取得了显著进展。这些技术使得对基因组样本进行测序比以往更便宜、更快且覆盖度更高。这一技术进步引发了在不同条件或环境下对多个样本进行测序的趋势,以探索微生物群落的异同。例如人类微生物组计划以及对人类肠道的各种研究。随着此类测量的数据库不断增大,寻找与给定查询样本相似的样本正成为一项核心操作。

结果

在本文中,我们开发了一种基于内容的全基因组测序样本探索与检索方法。我们应用分布式字符串挖掘框架,从宏基因组样本池中高效提取所有信息丰富的序列k-mer,并使用它们来衡量两个样本之间的差异。我们在两个人类肠道宏基因组数据集以及人类微生物组计划宏基因组样本上评估了所提出方法的性能。我们观察到,在使用另一个患病样本进行查询时,患病肠道样本在结果中显著富集,并且即使该方法是无监督的,在区分不同身体部位时也具有很高的准确性。

可用性与实现

DSM框架的软件实现可在https://github.com/HIITMetagenomics/dsm-framework获取。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/6d6c193f98f9/btu340f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/af8e4a411440/btu340f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/f3c405ad522e/btu340f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/d275d299c914/btu340f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/a2403ecef8a2/btu340f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/2e4c1ac62870/btu340f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/5c2eb2fa922e/btu340f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/6d6c193f98f9/btu340f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/af8e4a411440/btu340f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/f3c405ad522e/btu340f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/d275d299c914/btu340f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/a2403ecef8a2/btu340f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/2e4c1ac62870/btu340f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/5c2eb2fa922e/btu340f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c7c/4230234/6d6c193f98f9/btu340f7p.jpg

相似文献

1
Exploration and retrieval of whole-metagenome sequencing samples.全基因组测序样本的探索与检索。
Bioinformatics. 2014 Sep 1;30(17):2471-9. doi: 10.1093/bioinformatics/btu340. Epub 2014 May 19.
2
Estimating the total genome length of a metagenomic sample using k-mers.利用 k- -mer 估算宏基因组样本的总基因组长度。
BMC Genomics. 2019 Apr 4;20(Suppl 2):183. doi: 10.1186/s12864-019-5467-x.
3
ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe:用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。
Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.
4
A Content-Based Retrieval Framework for Whole Metagenome Sequencing Samples.一种用于全宏基因组测序样本的基于内容的检索框架。
J Integr Bioinform. 2018 Oct 26;15(4):20170067. doi: 10.1515/jib-2017-0067.
5
CAIM: coverage-based analysis for identification of microbiome.CAIM:基于覆盖度的微生物组分析方法。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae424.
6
Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets.无双:一种基于冗余的方法,用于评估宏基因组数据集的覆盖度水平。
Bioinformatics. 2014 Mar 1;30(5):629-35. doi: 10.1093/bioinformatics/btt584. Epub 2013 Oct 11.
7
Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis.用于宏基因组差异分析的k-mer谱适用性评估。
BMC Bioinformatics. 2016 Jan 16;17:38. doi: 10.1186/s12859-015-0875-7.
8
MetaGen: reference-free learning with multiple metagenomic samples.MetaGen:使用多个宏基因组样本进行无参考学习。
Genome Biol. 2017 Oct 3;18(1):187. doi: 10.1186/s13059-017-1323-y.
9
OGRE: Overlap Graph-based metagenomic Read clustEring.OGRE:基于重叠图的宏基因组读聚类。
Bioinformatics. 2021 May 17;37(7):905-912. doi: 10.1093/bioinformatics/btaa760.
10
MetaShot: an accurate workflow for taxon classification of host-associated microbiome from shotgun metagenomic data.MetaShot:一种从鸟枪法宏基因组数据中对宿主相关微生物群进行分类单元分类的精确工作流程。
Bioinformatics. 2017 Jun 1;33(11):1730-1732. doi: 10.1093/bioinformatics/btx036.

引用本文的文献

1
Current Status of Next-Generation Sequencing in Bone Genetic Diseases.骨遗传病中下一代测序的现状。
Int J Mol Sci. 2023 Sep 7;24(18):13802. doi: 10.3390/ijms241813802.
2
Insights Into the Resistome of Bovine Clinical Mastitis Microbiome, a Key Factor in Disease Complication.牛临床乳腺炎微生物组耐药组的见解,疾病并发症的关键因素。
Front Microbiol. 2020 Jun 3;11:860. doi: 10.3389/fmicb.2020.00860. eCollection 2020.
3
Streaming histogram sketching for rapid microbiome analytics.流式直方图概要分析快速微生物组分析。

本文引用的文献

1
DSK: k-mer counting with very low memory usage.DSK:使用极低内存进行 k-mer 计数。
Bioinformatics. 2013 Mar 1;29(5):652-3. doi: 10.1093/bioinformatics/btt020. Epub 2013 Jan 16.
2
Compareads: comparing huge metagenomic experiments.Compareads:比较大型宏基因组实验。
BMC Bioinformatics. 2012;13 Suppl 19(Suppl 19):S10. doi: 10.1186/1471-2105-13-S19-S10. Epub 2012 Dec 19.
3
Comparison of metagenomic samples using sequence signatures.基于序列特征比较宏基因组样本。
Microbiome. 2019 Mar 16;7(1):40. doi: 10.1186/s40168-019-0653-2.
4
Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons.Libra:一种基于可扩展 k-mer 的大规模所有与所有宏基因组比较工具。
Gigascience. 2019 Feb 1;8(2):giy165. doi: 10.1093/gigascience/giy165.
5
A Content-Based Retrieval Framework for Whole Metagenome Sequencing Samples.一种用于全宏基因组测序样本的基于内容的检索框架。
J Integr Bioinform. 2018 Oct 26;15(4):20170067. doi: 10.1515/jib-2017-0067.
6
GePMI: A statistical model for personal intestinal microbiome identification.GePMI:一种用于个人肠道微生物群识别的统计模型。
NPJ Biofilms Microbiomes. 2018 Sep 4;4:20. doi: 10.1038/s41522-018-0065-2. eCollection 2018.
7
Recent Advances in the Etiopathogenesis of Inflammatory Bowel Disease: The Role of Omics.炎症性肠病发病机制的最新进展:组学的作用。
Mol Diagn Ther. 2018 Feb;22(1):11-23. doi: 10.1007/s40291-017-0298-4.
8
Genome-wide identification of lineage and locus specific variation associated with pneumococcal carriage duration.全基因组鉴定与肺炎球菌携带持续时间相关的谱系和基因座特异性变异。
Elife. 2017 Jul 25;6:e26255. doi: 10.7554/eLife.26255.
9
Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes.序列元件富集分析确定细菌表型的遗传基础。
Nat Commun. 2016 Sep 16;7:12797. doi: 10.1038/ncomms12797.
10
Mash: fast genome and metagenome distance estimation using MinHash.Mash:使用MinHash进行快速的基因组和宏基因组距离估计。
Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.
BMC Genomics. 2012 Dec 27;13:730. doi: 10.1186/1471-2164-13-730.
4
Genomic variation landscape of the human gut microbiome.人类肠道微生物组的基因组变异景观。
Nature. 2013 Jan 3;493(7430):45-50. doi: 10.1038/nature11711. Epub 2012 Dec 5.
5
Real time metagenomics: using k-mers to annotate metagenomes.实时宏基因组学:使用 k- -mer 对宏基因组进行注释。
Bioinformatics. 2012 Dec 15;28(24):3316-7. doi: 10.1093/bioinformatics/bts599. Epub 2012 Oct 9.
6
A metagenome-wide association study of gut microbiota in type 2 diabetes.2 型糖尿病患者肠道微生物组的宏基因组关联研究。
Nature. 2012 Oct 4;490(7418):55-60. doi: 10.1038/nature11450. Epub 2012 Sep 26.
7
Meta-Storms: efficient search for similar microbial communities based on a novel indexing scheme and similarity score for metagenomic data.元风暴:基于新型索引方案和微生物组数据相似度评分的相似微生物群落高效搜索。
Bioinformatics. 2012 Oct 1;28(19):2493-501. doi: 10.1093/bioinformatics/bts470. Epub 2012 Jul 26.
8
Analyses of the microbial diversity across the human microbiome.人类微生物组中微生物多样性的分析。
PLoS One. 2012;7(6):e32118. doi: 10.1371/journal.pone.0032118. Epub 2012 Jun 13.
9
Structure, function and diversity of the healthy human microbiome.健康人体微生物组的结构、功能与多样性。
Nature. 2012 Jun 13;486(7402):207-14. doi: 10.1038/nature11234.
10
Metagenomic microbial community profiling using unique clade-specific marker genes.基于独特进化枝特异性标记基因的宏基因组微生物群落分析。
Nat Methods. 2012 Jun 10;9(8):811-4. doi: 10.1038/nmeth.2066.