Suppr超能文献

关于对微生物群落进行单倍型分型的复杂性

On the complexity of haplotyping a microbial community.

作者信息

Nicholls Samuel M, Aubrey Wayne, De Grave Kurt, Schietgat Leander, Creevey Christopher J, Clare Amanda

机构信息

Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, UK.

Department of Computer Science, Katholieke Universiteit Leuven, 3001 Leuven, Belgium.

出版信息

Bioinformatics. 2021 Jun 16;37(10):1360-1366. doi: 10.1093/bioinformatics/btaa977.

Abstract

MOTIVATION

Population-level genetic variation enables competitiveness and niche specialization in microbial communities. Despite the difficulty in culturing many microbes from an environment, we can still study these communities by isolating and sequencing DNA directly from an environment (metagenomics). Recovering the genomic sequences of all isoforms of a given gene across all organisms in a metagenomic sample would aid evolutionary and ecological insights into microbial ecosystems with potential benefits for medicine and biotechnology. A significant obstacle to this goal arises from the lack of a computationally tractable solution that can recover these sequences from sequenced read fragments. This poses a problem analogous to reconstructing the two sequences that make up the genome of a diploid organism (i.e. haplotypes) but for an unknown number of individuals and haplotypes.

RESULTS

The problem of single individual haplotyping was first formalized by Lancia et al. in 2001. Now, nearly two decades later, we discuss the complexity of 'haplotyping' metagenomic samples, with a new formalization of Lancia et al.'s data structure that allows us to effectively extend the single individual haplotype problem to microbial communities. This work describes and formalizes the problem of recovering genes (and other genomic subsequences) from all individuals within a complex community sample, which we term the metagenomic individual haplotyping problem. We also provide software implementations for a pairwise single nucleotide variant (SNV) co-occurrence matrix and greedy graph traversal algorithm.

AVAILABILITY AND IMPLEMENTATION

Our reference implementation of the described pairwise SNV matrix (Hansel) and greedy haplotype path traversal algorithm (Gretel) is open source, MIT licensed and freely available online at github.com/samstudio8/hansel and github.com/samstudio8/gretel, respectively.

摘要

动机

群体水平的遗传变异使微生物群落具有竞争力和生态位特化。尽管从环境中培养许多微生物存在困难,但我们仍可通过直接从环境中分离DNA并进行测序(宏基因组学)来研究这些群落。在宏基因组样本中恢复给定基因在所有生物体中的所有异构体的基因组序列,将有助于从进化和生态角度深入了解微生物生态系统,这对医学和生物技术可能具有潜在益处。实现这一目标的一个重大障碍是缺乏一种计算上易于处理的解决方案,无法从测序读段片段中恢复这些序列。这带来了一个类似于重建构成二倍体生物体基因组的两条序列(即单倍型)的问题,但涉及未知数量的个体和单倍型。

结果

单一个体单倍型分型问题最早由兰恰等人在2001年正式提出。如今,近二十年后,我们讨论了“宏基因组样本单倍型分型”的复杂性,对兰恰等人的数据结构进行了新的形式化,使我们能够有效地将单一个体单倍型问题扩展到微生物群落。这项工作描述并形式化了从复杂群落样本中的所有个体恢复基因(和其他基因组子序列)的问题,我们将其称为宏基因组个体单倍型分型问题。我们还提供了成对单核苷酸变异(SNV)共现矩阵和贪婪图遍历算法的软件实现。

可用性和实现方式

我们所描述的成对SNV矩阵(Hansel)和贪婪单倍型路径遍历算法(Gretel)的参考实现是开源的,遵循MIT许可,分别可在github.com/samstudio8/hansel和github.com/samstudio8/gretel上免费在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f3c2/8208737/914dfbd223ef/btaa977f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验