Suppr超能文献

元基因搜索在短读长宏基因组中进行蛋白质结构域注释。

MetaGeneHunt for protein domain annotation in short-read metagenomes.

机构信息

Department of biological Sciences, California State University, Long Beach, California, USA.

Department of Bioinformatics, University of Georgia Athens, Athens, Georgia, USA.

出版信息

Sci Rep. 2020 May 7;10(1):7712. doi: 10.1038/s41598-020-63775-1.

Abstract

The annotation of short-reads metagenomes is an essential process to understand the functional potential of sequenced microbial communities. Annotation techniques based solely on the identification of local matches tend to confound local sequence similarity and overall protein homology and thus don't mirror the complex multidomain architecture and the shuffling of functional domains in many protein families. Here, we present MetaGeneHunt to identify specific protein domains and to normalize the hit-counts based on the domain length. We used MetaGeneHunt to investigate the potential for carbohydrate processing in the mouse gastrointestinal tract. We sampled, sequenced, and analyzed the microbial communities associated with the bolus in the stomach, intestine, cecum, and colon of five captive mice. Focusing on Glycoside Hydrolases (GHs) we found that, across samples, 58.3% of the 4,726,023 short-read sequences matching with a GH domain-containing protein were located outside the domain of interest. Next, before comparing the samples, the counts of localized hits matching the domains of interest were normalized to account for the corresponding domain length. Microbial communities in the intestine and cecum displayed characteristic GH profiles matching distinct microbial assemblages. Conversely, the stomach and colon were associated with structurally and functionally more diverse and variable microbial communities. Across samples, despite fluctuations, changes in the functional potential for carbohydrate processing correlated with changes in community composition. Overall MetaGeneHunt is a new way to quickly and precisely identify discrete protein domains in sequenced metagenomes processed with MG-RAST. In addition, using the sister program "GeneHunt" to create custom Reference Annotation Table, MetaGeneHunt provides an unprecedented way to (re)investigate the precise distribution of any protein domain in short-reads metagenomes.

摘要

短读序列宏基因组注释是理解测序微生物群落功能潜力的必要过程。仅基于局部匹配识别的注释技术往往会混淆局部序列相似性和整体蛋白质同源性,因此无法反映许多蛋白质家族中复杂的多结构域架构和功能域的重排。在这里,我们提出了 MetaGeneHunt 来识别特定的蛋白质结构域,并根据结构域长度对命中计数进行归一化。我们使用 MetaGeneHunt 来研究小鼠胃肠道中碳水化合物加工的潜力。我们对 5 只圈养小鼠胃、肠、盲肠和结肠中的食团相关微生物群落进行了采样、测序和分析。我们专注于糖苷水解酶(GHs),发现跨样本,与含有 GH 结构域的蛋白质匹配的 4,726,023 条短读序列中,有 58.3%位于感兴趣的结构域外。然后,在比较样本之前,将与感兴趣的结构域匹配的局部命中计数归一化,以考虑相应的结构域长度。肠和盲肠中的微生物群落显示出与独特微生物组合相匹配的特征 GH 图谱。相反,胃和结肠与结构和功能上更具多样性和可变性的微生物群落相关。跨样本,尽管存在波动,但碳水化合物加工功能潜力的变化与群落组成的变化相关。总体而言,MetaGeneHunt 是一种在使用 MG-RAST 处理的测序宏基因组中快速准确地识别离散蛋白质结构域的新方法。此外,使用姊妹程序“GeneHunt”创建自定义参考注释表,MetaGeneHunt 提供了一种前所未有的方法来(重新)研究任何蛋白质结构域在短读序列宏基因组中的精确分布。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a112/7205989/ff7ecc5235e1/41598_2020_63775_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验