Center for Dark Energy Biosphere Investigations, University of Southern California, Los Angeles, CA 90089, USA.
Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA.
Sci Data. 2018 Jan 16;5:170203. doi: 10.1038/sdata.2017.203.
Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large.
微生物在介导海洋环境中的全球生物地球化学循环中起着至关重要的作用。通过宏基因组学重建环境生物的基因组,研究人员能够研究在实验室中具有抗性的细菌和古菌的代谢潜力。利用在 Tara Oceans 环球考察探险中收集的 234 个样本生成的大型宏基因组数据集,我们能够将 1020 亿对末端读取组装成 5.62 亿个重叠群,然后将这些重叠群进行共组装和整合,形成 720 万个长度≥2kb 的重叠群。这些重叠群中约有 100 万个被分类成用于重建草案基因组。总共生成了 2631 个估计完成度≥50%的草案基因组(1491 个草案基因组完成度>70%;603 个基因组完成度>90%)。大多数草案基因组是根据一系列串联的系统发育标记基因和/或 16S rRNA 基因序列手动分配系统发育的。这些草案基因组现在可供广大研究界公开使用。