Suppr超能文献

Phymm和PhymmBL:基于插值马尔可夫模型的宏基因组系统发育分类

Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models.

作者信息

Brady Arthur, Salzberg Steven L

机构信息

Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.

出版信息

Nat Methods. 2009 Sep;6(9):673-6. doi: 10.1038/nmeth.1358. Epub 2009 Aug 2.

Abstract

Metagenomics projects collect DNA from uncharacterized environments that may contain thousands of species per sample. One main challenge facing metagenomic analysis is phylogenetic classification of raw sequence reads into groups representing the same or similar taxa, a prerequisite for genome assembly and for analyzing the biological diversity of a sample. New sequencing technologies have made metagenomics easier, by making sequencing faster, and more difficult, by producing shorter reads than previous technologies. Classifying sequences from reads as short as 100 base pairs has until now been relatively inaccurate, requiring researchers to use older, long-read technologies. We present Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 base pairs, a substantial improvement over previous composition-based classification methods. We also describe how combining Phymm with sequence alignment algorithms improves accuracy.

摘要

宏基因组学项目从未经表征的环境中收集DNA,每个样本可能包含数千个物种。宏基因组分析面临的一个主要挑战是将原始序列读数进行系统发育分类,归入代表相同或相似分类群的组中,这是基因组组装以及分析样本生物多样性的一个先决条件。新的测序技术使宏基因组学变得更容易,因为测序速度更快了,但同时也更困难了,因为与以前的技术相比,读段更短了。到目前为止,将短至100个碱基对的读段进行序列分类一直相对不准确,这就要求研究人员使用更旧的长读段技术。我们展示了Phymm,一种用于宏基因组数据的分类器,它在539个完整的、经过整理的基因组上进行了训练,能够准确地对短至100个碱基对的读段进行分类,这比以前基于组成的分类方法有了实质性的改进。我们还描述了将Phymm与序列比对算法相结合如何提高准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1290/2762791/55be833d36e4/nihms128792f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验