Bio-sciences R&D Division, TCS Innovation Labs, Tata Consultancy Services Limited, 1 Software Units Layout, Madhapur, Hyderabad - 500081, Andhra Pradesh, India.
BMC Bioinformatics. 2011;12 Suppl 13(Suppl 13):S9. doi: 10.1186/1471-2105-12-S13-S9. Epub 2011 Nov 30.
One of the primary goals of comparative metagenomic projects is to study the differences in the microbial communities residing in diverse environments. Besides providing valuable insights into the inherent structure of the microbial populations, these studies have potential applications in several important areas of medical research like disease diagnostics, detection of pathogenic contamination and identification of hitherto unknown pathogens. Here we present a novel and rapid, alignment-free method called HabiSign, which utilizes patterns of tetra-nucleotide usage in microbial genomes to bring out the differences in the composition of both diverse and related microbial communities.
Validation results show that the metagenomic signatures obtained using the HabiSign method are able to accurately cluster metagenomes at biome, phenotypic and species levels, as compared to an average tetranucleotide frequency based approach and the recently published dinucleotide relative abundance based approach. More importantly, the method is able to identify subsets of sequences that are specific to a particular habitat. Apart from this, being alignment-free, the method can rapidly compare and group multiple metagenomic data sets in a short span of time.
The proposed method is expected to have immense applicability in diverse areas of metagenomic research ranging from disease diagnostics and pathogen detection to bio-prospecting. A web-server for the HabiSign algorithm is available at http://metagenomics.atc.tcs.com/HabiSign/.
比较宏基因组项目的主要目标之一是研究居住在不同环境中的微生物群落的差异。除了深入了解微生物种群的固有结构外,这些研究在医学研究的几个重要领域具有潜在的应用价值,如疾病诊断、检测致病性污染和鉴定以前未知的病原体。在这里,我们提出了一种新颖而快速的、无需比对的方法,称为 HabiSign,它利用微生物基因组中四核苷酸使用模式来突出不同和相关微生物群落组成的差异。
验证结果表明,与基于平均四核苷酸频率的方法和最近发表的基于二核苷酸相对丰度的方法相比,使用 HabiSign 方法获得的宏基因组特征能够准确地在生物群落、表型和物种水平上对宏基因组进行聚类。更重要的是,该方法能够识别特定栖息地特有的序列子集。除此之外,由于该方法是无需比对的,因此可以在短时间内快速比较和分组多个宏基因组数据集。
该方法有望在从疾病诊断和病原体检测到生物勘探的宏基因组研究的各个领域具有广泛的适用性。HabiSign 算法的网络服务器可在 http://metagenomics.atc.tcs.com/HabiSign/ 上获得。