Jangid Kamlesh, Kao Ming-Hung, Lahamge Aishwarya, Williams Mark A, Rathbun Stephen L, Whitman William B
Department of Microbiology, University of Georgia, Athens, Georgia, United States of America.
Microbial Culture Collection, National Centre for Cell Science, Savitribai Phule Pune University, Pune, Maharashtra, India.
PLoS One. 2016 Dec 2;11(12):e0167634. doi: 10.1371/journal.pone.0167634. eCollection 2016.
K-shuff is a new algorithm for comparing the similarity of gene sequence libraries, providing measures of the structural and compositional diversity as well as the significance of the differences between these measures. Inspired by Ripley's K-function for spatial point pattern analysis, the Intra K-function or IKF measures the structural diversity, including both the richness and overall similarity of the sequences, within a library. The Cross K-function or CKF measures the compositional diversity between gene libraries, reflecting both the number of OTUs shared as well as the overall similarity in OTUs. A Monte Carlo testing procedure then enables statistical evaluation of both the structural and compositional diversity between gene libraries. For 16S rRNA gene libraries from complex bacterial communities such as those found in seawater, salt marsh sediments, and soils, K-shuff yields reproducible estimates of structural and compositional diversity with libraries greater than 50 sequences. Similarly, for pyrosequencing libraries generated from a glacial retreat chronosequence and Illumina® libraries generated from US homes, K-shuff required >300 and 100 sequences per sample, respectively. Power analyses demonstrated that K-shuff is sensitive to small differences in Sanger or Illumina® libraries. This extra sensitivity of K-shuff enabled examination of compositional differences at much deeper taxonomic levels, such as within abundant OTUs. This is especially useful when comparing communities that are compositionally very similar but functionally different. K-shuff will therefore prove beneficial for conventional microbiome analysis as well as specific hypothesis testing.
K-shuff是一种用于比较基因序列文库相似性的新算法,可提供结构和组成多样性的度量以及这些度量之间差异的显著性。受用于空间点模式分析的Ripley's K函数启发,库内K函数(IKF)测量文库内的结构多样性,包括序列的丰富度和整体相似性。交叉K函数(CKF)测量基因文库之间的组成多样性,反映共享的操作分类单元(OTU)数量以及OTU的整体相似性。然后,蒙特卡罗测试程序能够对基因文库之间的结构和组成多样性进行统计评估。对于来自复杂细菌群落(如海水、盐沼沉积物和土壤中的群落)的16S rRNA基因文库,K-shuff对于序列数大于50的文库能够产生可重复的结构和组成多样性估计值。同样,对于由冰川消退时间序列生成的焦磷酸测序文库和由美国家庭生成的Illumina®文库,K-shuff分别需要每个样本>300和100个序列。功效分析表明,K-shuff对Sanger或Illumina®文库中的微小差异敏感。K-shuff的这种额外敏感性使得能够在更深的分类水平上检查组成差异,例如在丰富的OTU内。在比较组成非常相似但功能不同的群落时,这特别有用。因此,K-shuff将被证明对传统的微生物组分析以及特定的假设检验有益。