• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

INDUS-一种基于组合的方法,用于快速准确地对宏基因组序列进行分类。

INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences.

机构信息

Bio-sciences R&D Division, TCS Innovation Labs, Tata Consultancy Services Limited, 1 Software Units Layout, Madhapur, Hyderabad - 500081, Andhra Pradesh, India.

出版信息

BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2164-12-S3-S4.

DOI:10.1186/1471-2164-12-S3-S4
PMID:22369237
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3333187/
Abstract

BACKGROUND

Taxonomic classification of metagenomic sequences is the first step in metagenomic analysis. Existing taxonomic classification approaches are of two types, similarity-based and composition-based. Similarity-based approaches, though accurate and specific, are extremely slow. Since, metagenomic projects generate millions of sequences, adopting similarity-based approaches becomes virtually infeasible for research groups having modest computational resources. In this study, we present INDUS - a composition-based approach that incorporates the following novel features. First, INDUS discards the 'one genome-one composition' model adopted by existing compositional approaches. Second, INDUS uses 'compositional distance' information for identifying appropriate assignment levels. Third, INDUS incorporates steps that attempt to reduce biases due to database representation.

RESULTS

INDUS is able to rapidly classify sequences in both simulated and real metagenomic sequence data sets with classification efficiency significantly higher than existing composition-based approaches. Although the classification efficiency of INDUS is observed to be comparable to those by similarity-based approaches, the binning time (as compared to alignment based approaches) is 23-33 times lower.

CONCLUSION

Given it's rapid execution time, and high levels of classification efficiency, INDUS is expected to be of immense interest to researchers working in metagenomics and microbial ecology.

AVAILABILITY

A web-server for the INDUS algorithm is available at http://metagenomics.atc.tcs.com/INDUS/

摘要

背景

宏基因组序列的分类学分类是宏基因组分析的第一步。现有的分类学分类方法有两种,基于相似性的和基于组成的。基于相似性的方法虽然准确和具体,但速度非常慢。由于宏基因组项目产生了数百万条序列,对于计算资源有限的研究小组来说,采用基于相似性的方法几乎是不可行的。在本研究中,我们提出了 INDUS——一种基于组成的方法,它包含以下新特性。首先,INDUS 摒弃了现有组成方法所采用的“一个基因组一个组成”模型。其次,INDUS 使用“组成距离”信息来识别合适的分配水平。第三,INDUS 包含了试图减少由于数据库表示而产生偏差的步骤。

结果

INDUS 能够快速对模拟和真实宏基因组序列数据集进行分类,分类效率明显高于现有的基于组成的方法。虽然 INDUS 的分类效率与基于相似性的方法相当,但它的 binning 时间(与基于比对的方法相比)要低 23-33 倍。

结论

鉴于其快速的执行时间和高分类效率,INDUS 预计将引起从事宏基因组学和微生物生态学研究的研究人员的极大兴趣。

可用性

INDUS 算法的网络服务器可在 http://metagenomics.atc.tcs.com/INDUS/ 上获得。

相似文献

1
INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences.INDUS-一种基于组合的方法,用于快速准确地对宏基因组序列进行分类。
BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2164-12-S3-S4.
2
SPHINX--an algorithm for taxonomic binning of metagenomic sequences.SPHINX——一种用于宏基因组序列分类-bin 划分的算法。
Bioinformatics. 2011 Jan 1;27(1):22-30. doi: 10.1093/bioinformatics/btq608. Epub 2010 Oct 28.
3
TWARIT: an extremely rapid and efficient approach for phylogenetic classification of metagenomic sequences.TWARIT:一种用于宏基因组序列系统发育分类的极快速有效的方法。
Gene. 2012 Sep 1;505(2):259-65. doi: 10.1016/j.gene.2012.06.014. Epub 2012 Jun 15.
4
i-rDNA: alignment-free algorithm for rapid in silico detection of ribosomal gene fragments from metagenomic sequence data sets.i-rDNA:一种无需序列比对的算法,可用于快速从宏基因组序列数据集中检测核糖体基因片段。
BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S12. doi: 10.1186/1471-2164-12-S3-S12.
5
DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences.DiScRIBINATE:一种用于宏基因组序列准确分类的快速方法。
BMC Bioinformatics. 2010 Oct 15;11 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2105-11-S7-S14.
6
Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin.利用 MetaBin 实现宏基因组序列的快速、准确分类学赋值。
PLoS One. 2012;7(4):e34030. doi: 10.1371/journal.pone.0034030. Epub 2012 Apr 4.
7
SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences.SOrt-ITEMS:基于序列直系同源性的方法,用于改进宏基因组序列的分类学估计。
Bioinformatics. 2009 Jul 15;25(14):1722-30. doi: 10.1093/bioinformatics/btp317. Epub 2009 May 13.
8
Binpairs: utilization of Illumina paired-end information for improving efficiency of taxonomic binning of metagenomic sequences.双端序列对:利用Illumina双端测序信息提高宏基因组序列分类分箱的效率
PLoS One. 2014 Dec 31;9(12):e114814. doi: 10.1371/journal.pone.0114814. eCollection 2014.
9
MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis.MTSv:快速基于比对的分类学分类和高置信度宏基因组分析。
PeerJ. 2022 Nov 8;10:e14292. doi: 10.7717/peerj.14292. eCollection 2022.
10
A novel semi-supervised algorithm for the taxonomic assignment of metagenomic reads.一种用于宏基因组 reads 分类归属的新型半监督算法。
BMC Bioinformatics. 2016 Jan 6;17:22. doi: 10.1186/s12859-015-0872-x.

引用本文的文献

1
An Integrated Multi-Disciplinary Perspectivefor Addressing Challenges of the Human Gut Microbiome.一种应对人类肠道微生物群挑战的综合多学科视角
Metabolites. 2020 Mar 6;10(3):94. doi: 10.3390/metabo10030094.
2
A clinician's guide to microbiome analysis.临床医生微生物组分析指南。
Nat Rev Gastroenterol Hepatol. 2017 Oct;14(10):585-595. doi: 10.1038/nrgastro.2017.97. Epub 2017 Aug 9.
3
Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques.使用关联规则挖掘技术从宏基因组数据集中推断群落内微生物相互作用模式

本文引用的文献

1
Metagenome of the gut of a malnourished child.营养不良儿童肠道宏基因组。
Gut Pathog. 2011 May 20;3:7. doi: 10.1186/1757-4749-3-7.
2
DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences.DiScRIBINATE:一种用于宏基因组序列准确分类的快速方法。
BMC Bioinformatics. 2010 Oct 15;11 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2105-11-S7-S14.
3
NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads.NBC:用于宏基因组读取分类的朴素贝叶斯分类工具网络服务器。
PLoS One. 2016 Apr 28;11(4):e0154493. doi: 10.1371/journal.pone.0154493. eCollection 2016.
4
Vikodak--A Modular Framework for Inferring Functional Potential of Microbial Communities from 16S Metagenomic Datasets.Vikodak——一个用于从16S宏基因组数据集中推断微生物群落功能潜力的模块化框架。
PLoS One. 2016 Feb 5;11(2):e0148347. doi: 10.1371/journal.pone.0148347. eCollection 2016.
5
Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities.使用计算机模拟和体外模拟群落评估鸟枪法宏基因组学序列分类方法
BMC Bioinformatics. 2015 Nov 4;16:363. doi: 10.1186/s12859-015-0788-5.
6
Homology-independent metrics for comparative genomics.用于比较基因组学的非同源性指标
Comput Struct Biotechnol J. 2015 May 4;13:352-7. doi: 10.1016/j.csbj.2015.04.005. eCollection 2015.
7
Binpairs: utilization of Illumina paired-end information for improving efficiency of taxonomic binning of metagenomic sequences.双端序列对:利用Illumina双端测序信息提高宏基因组序列分类分箱的效率
PLoS One. 2014 Dec 31;9(12):e114814. doi: 10.1371/journal.pone.0114814. eCollection 2014.
8
Metagenomic detection of viruses in aerosol samples from workers in animal slaughterhouses.从动物屠宰场工人的气溶胶样本中进行病毒的宏基因组检测。
PLoS One. 2013 Aug 14;8(8):e72226. doi: 10.1371/journal.pone.0072226. eCollection 2013.
9
Computational meta'omics for microbial community studies.计算宏基因组学在微生物群落研究中的应用。
Mol Syst Biol. 2013 May 14;9:666. doi: 10.1038/msb.2013.22.
10
Class prediction and feature selection with linear optimization for metagenomic count data.基于线性优化的宏基因组计数数据的分类预测和特征选择。
PLoS One. 2013;8(3):e53253. doi: 10.1371/journal.pone.0053253. Epub 2013 Mar 26.
Bioinformatics. 2011 Jan 1;27(1):127-9. doi: 10.1093/bioinformatics/btq619. Epub 2010 Nov 8.
4
SPHINX--an algorithm for taxonomic binning of metagenomic sequences.SPHINX——一种用于宏基因组序列分类-bin 划分的算法。
Bioinformatics. 2011 Jan 1;27(1):22-30. doi: 10.1093/bioinformatics/btq608. Epub 2010 Oct 28.
5
Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models.Phymm和PhymmBL:基于插值马尔可夫模型的宏基因组系统发育分类
Nat Methods. 2009 Sep;6(9):673-6. doi: 10.1038/nmeth.1358. Epub 2009 Aug 2.
6
SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences.SOrt-ITEMS:基于序列直系同源性的方法,用于改进宏基因组序列的分类学估计。
Bioinformatics. 2009 Jul 15;25(14):1722-30. doi: 10.1093/bioinformatics/btp317. Epub 2009 May 13.
7
TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach.TACOA:使用核化最近邻方法对环境基因组片段进行分类学分类。
BMC Bioinformatics. 2009 Feb 11;10:56. doi: 10.1186/1471-2105-10-56.
8
MetaSim: a sequencing simulator for genomics and metagenomics.MetaSim:一款用于基因组学和宏基因组学的测序模拟器。
PLoS One. 2008 Oct 8;3(10):e3373. doi: 10.1371/journal.pone.0003373.
9
Phylogenetic classification of short environmental DNA fragments.短环境DNA片段的系统发育分类
Nucleic Acids Res. 2008 Apr;36(7):2230-9. doi: 10.1093/nar/gkn038. Epub 2008 Feb 19.
10
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods.使用模拟数据集评估宏基因组学处理方法的保真度。
Nat Methods. 2007 Jun;4(6):495-500. doi: 10.1038/nmeth1043. Epub 2007 Apr 29.