Suppr超能文献

用于宏基因组序列分析的超快聚类算法。

Ultrafast clustering algorithms for metagenomic sequence analysis.

机构信息

Center for Research in Biological Systems, University of California San Diego, USA.

出版信息

Brief Bioinform. 2012 Nov;13(6):656-68. doi: 10.1093/bib/bbs035. Epub 2012 Jul 6.

Abstract

The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters.

摘要

高通量测序技术的快速发展极大地推动了对存在于各种环境中的微生物群落的宏基因组研究。宏基因组学中的基本问题包括微生物种群的身份、组成和动态及其功能和相互作用。然而,这些序列数据的海量和综合复杂性在数据分析方面带来了巨大的挑战。这些挑战包括但不限于不断增加的计算需求、序列采样偏差、序列错误、序列伪影和新序列。序列聚类方法可以通过将相似的序列分组到家族中,直接回答许多基本问题。此外,聚类分析也解决了宏基因组学中的挑战。因此,大量的冗余数据集可以用一个小的非冗余集来表示,其中每个聚类都可以用单个条目或共识来表示。通过聚类可以快速检测伪影。可以通过使用聚类中序列的共识来识别、过滤或纠正错误。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d259/3504929/7427950d74f7/bbs035f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验