Hall Michael W, Rohwer Robin R, Perrie Jonathan, McMahon Katherine D, Beiko Robert G
Faculty of Graduate Studies, Dalhousie University, Halifax, Nova Scotia, Canada.
Environmental Chemistry and Technology Program, University of Wisconsin-Madison, Madison, WI, United States of America.
PeerJ. 2017 Sep 26;5:e3812. doi: 10.7717/peerj.3812. eCollection 2017.
Taxonomic markers such as the 16S ribosomal RNA gene are widely used in microbial community analysis. A common first step in marker-gene analysis is grouping genes into clusters to reduce data sets to a more manageable size and potentially mitigate the effects of sequencing error. Instead of clustering based on sequence identity, marker-gene data sets collected over time can be clustered based on temporal correlation to reveal ecologically meaningful associations. We present Ananke, a free and open-source algorithm and software package that complements existing sequence-identity-based clustering approaches by clustering marker-gene data based on time-series profiles and provides interactive visualization of clusters, including highlighting of internal OTU inconsistencies. Ananke is able to cluster distinct temporal patterns from simulations of multiple ecological patterns, such as periodic seasonal dynamics and organism appearances/disappearances. We apply our algorithm to two longitudinal marker gene data sets: faecal communities from the human gut of an individual sampled over one year, and communities from a freshwater lake sampled over eleven years. Within the gut, the segregation of the bacterial community around a food-poisoning event was immediately clear. In the freshwater lake, we found that high sequence identity between marker genes does not guarantee similar temporal dynamics, and Ananke time-series clusters revealed patterns obscured by clustering based on sequence identity or taxonomy. Ananke is free and open-source software available at https://github.com/beiko-lab/ananke.
分类标记,如16S核糖体RNA基因,在微生物群落分析中被广泛使用。标记基因分析中常见的第一步是将基因分组到簇中,以将数据集缩小到更易于管理的大小,并可能减轻测序错误的影响。与基于序列同一性进行聚类不同,随着时间收集的标记基因数据集可以基于时间相关性进行聚类,以揭示具有生态意义的关联。我们展示了Ananke,这是一种免费的开源算法和软件包,它通过基于时间序列概况对标记基因数据进行聚类,对现有的基于序列同一性的聚类方法进行补充,并提供聚类的交互式可视化,包括突出显示内部OTU不一致性。Ananke能够从多种生态模式的模拟中聚类不同的时间模式,如周期性季节动态和生物体出现/消失。我们将我们的算法应用于两个纵向标记基因数据集:一个个体一年内采集的人类肠道粪便群落,以及一个淡水湖十一年间采集的群落。在肠道内,食物中毒事件周围细菌群落的分离立即显现出来。在淡水湖中,我们发现标记基因之间的高序列同一性并不能保证相似的时间动态,而Ananke时间序列聚类揭示了基于序列同一性或分类法聚类所掩盖的模式。Ananke是免费的开源软件,可在https://github.com/beiko-lab/ananke上获取。