Suppr超能文献

利用 GenBank 监测和分析宿主-微生物组数据。

GenBank as a source to monitor and analyze Host-Microbiome data.

机构信息

Center of Computational Molecular Biology Brown University, Providence, RI, USA.

Center for Biomedical Informatics Brown University, Providence, RI, USA.

出版信息

Bioinformatics. 2022 Sep 2;38(17):4172-4177. doi: 10.1093/bioinformatics/btac487.

Abstract

MOTIVATION

Microbiome datasets are often constrained by sequencing limitations. GenBank is the largest collection of publicly available DNA sequences, which is maintained by the National Center of Biotechnology Information (NCBI). The metadata of GenBank records are a largely understudied resource and may be uniquely leveraged to access the sum of prior studies focused on microbiome composition. Here, we developed a computational pipeline to analyze GenBank metadata, containing data on hosts, microorganisms and their place of origin. This work provides the first opportunity to leverage the totality of GenBank to shed light on compositional data practices that shape how microbiome datasets are formed as well as examine host-microbiome relationships.

RESULTS

The collected dataset contains multiple kingdoms of microorganisms, consisting of bacteria, viruses, archaea, protozoa, fungi, and invertebrate parasites, and hosts of multiple taxonomical classes, including mammals, birds and fish. A human data subset of this dataset provides insights to gaps in current microbiome data collection, which is biased towards clinically relevant pathogens. Clustering and phylogenic analysis reveals the potential to use these data to model host taxonomy and evolution, revealing groupings formed by host diet, environment and coevolution.

AVAILABILITY AND IMPLEMENTATION

GenBank Host-Microbiome Pipeline is available at https://github.com/bcbi/genbank_holobiome. The GenBank loader is available at https://github.com/bcbi/genbank_loader.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

微生物组数据集通常受到测序限制的约束。GenBank 是最大的公共 DNA 序列集合,由国家生物技术信息中心(NCBI)维护。GenBank 记录的元数据是一个很大程度上未被充分研究的资源,并且可以独特地利用这些资源来访问之前专注于微生物组组成的研究的总和。在这里,我们开发了一种计算管道来分析 GenBank 元数据,其中包含有关宿主、微生物及其来源地的数据。这项工作首次提供了利用 GenBank 的全部内容来揭示塑造微生物组数据集形成方式的组成数据实践的机会,并检查宿主-微生物组关系。

结果

收集的数据集包含多个微生物王国,包括细菌、病毒、古细菌、原生动物、真菌和无脊椎寄生虫,以及多个分类类别的宿主,包括哺乳动物、鸟类和鱼类。该数据集的人类数据子集提供了对当前微生物组数据收集存在偏见的见解,这些数据偏向于临床相关病原体。聚类和系统发育分析揭示了利用这些数据来模拟宿主分类和进化的潜力,揭示了由宿主饮食、环境和共同进化形成的分组。

可用性和实现

GenBank 宿主-微生物组管道可在 https://github.com/bcbi/genbank_holobiome 上获得。GenBank 加载器可在 https://github.com/bcbi/genbank_loader 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
9
Metadata retrieval from sequence databases with ffq.利用 ffq 从序列数据库中检索元数据。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac667.

引用本文的文献

本文引用的文献

6
Plant microbiota modified by plant domestication.植物驯化改变了植物微生物组。
Syst Appl Microbiol. 2020 Sep;43(5):126106. doi: 10.1016/j.syapm.2020.126106. Epub 2020 Jun 26.
10
Using ggtree to Visualize Data on Tree-Like Structures.使用 ggtree 可视化树状结构数据。
Curr Protoc Bioinformatics. 2020 Mar;69(1):e96. doi: 10.1002/cpbi.96.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验