关于对微生物群落进行单倍型分型的复杂性

On the complexity of haplotyping a microbial community.

作者信息

Nicholls Samuel M, Aubrey Wayne, De Grave Kurt, Schietgat Leander, Creevey Christopher J, Clare Amanda

机构信息

Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, UK.

Department of Computer Science, Katholieke Universiteit Leuven, 3001 Leuven, Belgium.

出版信息

Bioinformatics. 2021 Jun 16;37(10):1360-1366. doi: 10.1093/bioinformatics/btaa977.

DOI:10.1093/bioinformatics/btaa977

PMID:33444437

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8208737/

Abstract

MOTIVATION

Population-level genetic variation enables competitiveness and niche specialization in microbial communities. Despite the difficulty in culturing many microbes from an environment, we can still study these communities by isolating and sequencing DNA directly from an environment (metagenomics). Recovering the genomic sequences of all isoforms of a given gene across all organisms in a metagenomic sample would aid evolutionary and ecological insights into microbial ecosystems with potential benefits for medicine and biotechnology. A significant obstacle to this goal arises from the lack of a computationally tractable solution that can recover these sequences from sequenced read fragments. This poses a problem analogous to reconstructing the two sequences that make up the genome of a diploid organism (i.e. haplotypes) but for an unknown number of individuals and haplotypes.

RESULTS

The problem of single individual haplotyping was first formalized by Lancia et al. in 2001. Now, nearly two decades later, we discuss the complexity of 'haplotyping' metagenomic samples, with a new formalization of Lancia et al.'s data structure that allows us to effectively extend the single individual haplotype problem to microbial communities. This work describes and formalizes the problem of recovering genes (and other genomic subsequences) from all individuals within a complex community sample, which we term the metagenomic individual haplotyping problem. We also provide software implementations for a pairwise single nucleotide variant (SNV) co-occurrence matrix and greedy graph traversal algorithm.

AVAILABILITY AND IMPLEMENTATION

Our reference implementation of the described pairwise SNV matrix (Hansel) and greedy haplotype path traversal algorithm (Gretel) is open source, MIT licensed and freely available online at github.com/samstudio8/hansel and github.com/samstudio8/gretel, respectively.

摘要

动机

群体水平的遗传变异使微生物群落具有竞争力和生态位特化。尽管从环境中培养许多微生物存在困难，但我们仍可通过直接从环境中分离DNA并进行测序（宏基因组学）来研究这些群落。在宏基因组样本中恢复给定基因在所有生物体中的所有异构体的基因组序列，将有助于从进化和生态角度深入了解微生物生态系统，这对医学和生物技术可能具有潜在益处。实现这一目标的一个重大障碍是缺乏一种计算上易于处理的解决方案，无法从测序读段片段中恢复这些序列。这带来了一个类似于重建构成二倍体生物体基因组的两条序列（即单倍型）的问题，但涉及未知数量的个体和单倍型。

结果

单一个体单倍型分型问题最早由兰恰等人在2001年正式提出。如今，近二十年后，我们讨论了“宏基因组样本单倍型分型”的复杂性，对兰恰等人的数据结构进行了新的形式化，使我们能够有效地将单一个体单倍型问题扩展到微生物群落。这项工作描述并形式化了从复杂群落样本中的所有个体恢复基因（和其他基因组子序列）的问题，我们将其称为宏基因组个体单倍型分型问题。我们还提供了成对单核苷酸变异（SNV）共现矩阵和贪婪图遍历算法的软件实现。

可用性和实现方式

我们所描述的成对SNV矩阵（Hansel）和贪婪单倍型路径遍历算法（Gretel）的参考实现是开源的，遵循MIT许可，分别可在github.com/samstudio8/hansel和github.com/samstudio8/gretel上免费在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f3c2/8208737/914dfbd223ef/btaa977f1.jpg

相似文献

On the complexity of haplotyping a microbial community.关于对微生物群落进行单倍型分型的复杂性

Bioinformatics. 2021 Jun 16;37(10):1360-1366. doi: 10.1093/bioinformatics/btaa977.

Multi-centre evaluation of a comprehensive preimplantation genetic test through haplotyping-by-sequencing.通过测序单倍型分析对综合植入前遗传学检测的多中心评估。

Hum Reprod. 2019 Aug 1;34(8):1608-1619. doi: 10.1093/humrep/dez106.

MetaFast: fast reference-free graph-based comparison of shotgun metagenomic data.MetaFast：基于图的快速无参考鸟枪法宏基因组数据比较

Bioinformatics. 2016 Sep 15;32(18):2760-7. doi: 10.1093/bioinformatics/btw312. Epub 2016 Jun 3.

PEATH: single-individual haplotyping by a probabilistic evolutionary algorithm with toggling.PEATH：基于具有切换功能的概率进化算法的单个体系单倍型分型

Bioinformatics. 2018 Jun 1;34(11):1801-1807. doi: 10.1093/bioinformatics/bty012.

H-PoP and H-PoPG: heuristic partitioning algorithms for single individual haplotyping of polyploids.H-PoP 和 H-PoPG：用于多倍体单个体单体型分析的启发式分区算法。

Bioinformatics. 2016 Dec 15;32(24):3735-3744. doi: 10.1093/bioinformatics/btw537. Epub 2016 Aug 16.

A novel multifunctional haplotyping-based preimplantation genetic testing for different genetic conditions.一种新型的基于多功能单体型的胚胎植入前遗传学检测，用于不同的遗传疾病。

Hum Reprod. 2022 Oct 31;37(11):2546-2559. doi: 10.1093/humrep/deac190.

Probabilistic single-individual haplotyping.概率性单个体单倍型分型

Bioinformatics. 2014 Sep 1;30(17):i379-85. doi: 10.1093/bioinformatics/btu484.

PStrain: an iterative microbial strains profiling algorithm for shotgun metagenomic sequencing data.PStrain：一种用于鸟枪法宏基因组测序数据的迭代微生物菌株分析算法。

Bioinformatics. 2021 Apr 1;36(22-23):5499-5506. doi: 10.1093/bioinformatics/btaa1056.

Recovery of strain-resolved genomes from human microbiome through an integration framework of single-cell genomics and metagenomics.通过单细胞基因组学和宏基因组学的整合框架从人类微生物组中恢复菌株解析基因组。

Microbiome. 2021 Oct 12;9(1):202. doi: 10.1186/s40168-021-01152-4.

引用本文的文献

Overcoming challenges in metagenomic AMR surveillance with nanopore sequencing: a case study on fluoroquinolone resistance.利用纳米孔测序克服宏基因组抗菌药物耐药性监测中的挑战：氟喹诺酮耐药性案例研究

Front Microbiol. 2025 Jul 23;16:1614301. doi: 10.3389/fmicb.2025.1614301. eCollection 2025.

Analysis of metagenomic data.宏基因组数据的分析

Nat Rev Methods Primers. 2025;5. doi: 10.1038/s43586-024-00376-6. Epub 2025 Jan 23.

Strainy: phasing and assembly of strain haplotypes from long-read metagenome sequencing.Strainy：从长读宏基因组测序中对菌株单倍型进行相位和组装。

Nat Methods. 2024 Nov;21(11):2034-2043. doi: 10.1038/s41592-024-02424-1. Epub 2024 Sep 26.

Floria: fast and accurate strain haplotyping in metagenomes.弗洛里亚：宏基因组中快速准确的菌株单倍型分型。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i30-i38. doi: 10.1093/bioinformatics/btae252.

Analyzing rare mutations in metagenomes assembled using long and accurate reads.分析使用长而准确的reads 组装的宏基因组中的稀有突变。

Genome Res. 2022 Nov-Dec;32(11-12):2119-2133. doi: 10.1101/gr.276917.122. Epub 2022 Nov 23.

Microbial Populations Are Shaped by Dispersal and Recombination in a Low Biomass Subseafloor Habitat.微生物种群由低生物量海底栖息地中的扩散和重组形成。

mBio. 2022 Aug 30;13(4):e0035422. doi: 10.1128/mbio.00354-22. Epub 2022 Aug 1.

StrainXpress: strain aware metagenome assembly from short reads.StrainXpress：基于短读长的菌株感知宏基因组组装。

Nucleic Acids Res. 2022 Sep 23;50(17):e101. doi: 10.1093/nar/gkac543.

Enhancing Long-Read-Based Strain-Aware Metagenome Assembly.增强基于长读长的菌株感知宏基因组组装

Front Genet. 2022 May 13;13:868280. doi: 10.3389/fgene.2022.868280. eCollection 2022.

Generation and application of pseudo-long reads for metagenome assembly.用于宏基因组组装的伪长读的生成和应用。

Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac044.

Functional meta-omics provide critical insights into long- and short-read assemblies.功能宏基因组学为长读长和短读长组装提供了重要的见解。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab330.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

关于对微生物群落进行单倍型分型的复杂性

On the complexity of haplotyping a microbial community.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现方式

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献