Suppr超能文献

16S rRNA 序列聚类成 OTUs 的方法比较。

A comparison of methods for clustering 16S rRNA sequences into OTUs.

机构信息

College of Automation, Northwestern Polytechnical University, Xi'an, Shaanxi, China.

出版信息

PLoS One. 2013 Aug 13;8(8):e70837. doi: 10.1371/journal.pone.0070837. eCollection 2013.

Abstract

Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.

摘要

通过下一代测序对 16S rRNA 序列的最新研究极大地改变了我们对微生物群落组成和结构的理解。使用这些数据来探索微生物群落中的遗传多样性的一种常见方法是根据序列相似性将 16S rRNA 序列聚类为操作分类单元(OTUs)。然后可以使用推断的 OTUs 来估计物种、多样性、组成和丰富度。尽管已经开发并普遍使用了许多方法来将序列聚类为 OTUs,但关于它们的相对性能以及每种方法的关键参数选择的指导相对较少。在这项研究中,我们对十种现有的 OTU 推断方法进行了全面评估。我们发现,定义不同 OTUs 的适当不相似值不仅与特定方法有关,还与样本复杂性有关。对于复杂性较低的数据集,所有算法都需要更高的不相似性阈值来定义 OTUs。一些方法,例如 CROP 和 SLP,比其他方法更能抵抗阈值的特定选择,尤其是对于较短的读取。对于复杂性较高的数据集,层次聚类方法需要更严格的不相似性阈值来定义 OTUs,因为常用的 3%的不相似性阈值通常会导致低估 OTUs 的数量。总体而言,层次聚类方法在较低的不相似性阈值下表现更好。我们的结果表明,序列丰度在 OTU 推断中起着重要作用。我们得出结论,在选择不相似性阈值和 OTU 推断丰度时需要谨慎。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e0d/3742672/36bd2bc03d22/pone.0070837.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验