Foutel-Rodier Théo, Thierry Agnès, Koszul Romain, Marbouty Martial
Institut Pasteur, Département Génomes et Génétique, Groupe Régulation Spatiale des Génomes, Paris Cedex 15, France; CNRS, UMR 3525, Paris Cedex 15, France; Sorbonne Université, Collège Doctoral, Paris, France; Institut Pasteur, Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI), Paris, France.
Institut Pasteur, Département Génomes et Génétique, Groupe Régulation Spatiale des Génomes, Paris Cedex 15, France; CNRS, UMR 3525, Paris Cedex 15, France; Institut Pasteur, Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI), Paris, France.
Methods Enzymol. 2018;612:183-195. doi: 10.1016/bs.mie.2018.08.001. Epub 2018 Sep 18.
Microbial species thrive in very diverse environments and play fundamental roles in their equilibrium and dynamics. Metagenomics consists in extracting, sequencing, and studying the DNA present in ecosystems to better understand their regulation. Ideally, the maximal amount of information would be gathered from the full sequences of the genomes, episomes, and phages present in the microbial communities. Current high-throughput DNA sequencing produces reads ranging in size from a few dozen base pairs for the most commonly used technologies to several kb for emerging single-molecule real-time sequencing techniques. Although valuable information can be extracted from processing these DNA sequences into contigs, reconstructing full genomes remains a difficult task. Clustering contigs according to their similarities or read coverage covariations gives some insights on these genomes, but remains limited as viral sequences, or recent horizontal gene transfers, often differ from their host genomes. We recently developed meta3C, a proximity ligation approach that bins contigs in a sequence-independent way by quantifying and exploiting their tridimensional collisions frequencies in vivo. This technique has demonstrated a great potential to reconstruct genomes as well as to assign plasmids and phages to their hosts. It nevertheless requires a specific processing of the microbial samples before sequencing, which has to be carefully planned.
微生物物种在非常多样的环境中繁衍生息,并在其平衡和动态变化中发挥着重要作用。宏基因组学包括提取、测序和研究生态系统中存在的DNA,以更好地理解其调控机制。理想情况下,应从微生物群落中存在的基因组、附加体和噬菌体的完整序列中收集最大量的信息。当前的高通量DNA测序产生的读段大小范围很广,从最常用技术的几十碱基对到新兴的单分子实时测序技术的几千碱基对。虽然可以从将这些DNA序列处理成重叠群中提取有价值的信息,但重建完整基因组仍然是一项艰巨的任务。根据重叠群的相似性或读段覆盖共变对其进行聚类,能对这些基因组有一些了解,但由于病毒序列或近期的水平基因转移通常与其宿主基因组不同,这种方法仍然存在局限性。我们最近开发了meta3C,这是一种邻近连接方法,通过在体内量化和利用重叠群的三维碰撞频率,以序列独立的方式对重叠群进行分类。该技术在重建基因组以及将质粒和噬菌体分配给其宿主方面已显示出巨大潜力。然而,它在测序前需要对微生物样本进行特定处理,这必须精心规划。