Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Germany.
GFZ German Research Centre for Geosciences, Section Geomicrobiology, Potsdam, Germany.
Mol Ecol Resour. 2023 Jul;23(5):1066-1076. doi: 10.1111/1755-0998.13776. Epub 2023 Mar 20.
As most eukaryotic genomes are yet to be sequenced, the mechanisms underlying their contribution to different ecosystem processes remain untapped. Although approaches to recovering Prokaryotic genomes have become common in genome biology, few studies have tackled the recovery of eukaryotic genomes from metagenomes. This study assessed the reconstruction of microbial eukaryotic genomes using 6000 metagenomes from terrestrial and some transition environments using the EukRep pipeline. Only 215 metagenomic libraries yielded eukaryotic bins. From a total of 447 eukaryotic bins recovered 197 were classified at the phylum level. Streptophytes and fungi were the most represented clades with 83 and 73 bins, respectively. More than 78% of the obtained eukaryotic bins were recovered from samples whose biomes were classified as host-associated, aquatic, and anthropogenic terrestrial. However, only 93 bins were taxonomically assigned at the genus level and 17 bins at the species level. Completeness and contamination estimates were obtained for a total of 193 bins and consisted of 44.64% (σ = 27.41%) and 3.97% (σ = 6.53%), respectively. Micromonas commoda was the most frequent taxon found while Saccharomyces cerevisiae presented the highest completeness, probably because more reference genomes are available. Current measures of completeness are based on the presence of single-copy genes. However, mapping of the contigs from the recovered eukaryotic bins to the chromosomes of the reference genomes showed many gaps, suggesting that completeness measures should also include chromosome coverage. Recovering eukaryotic genomes will benefit significantly from long-read sequencing, development of tools for dealing with repeat-rich genomes, and improved reference genomes databases.
由于大多数真核生物基因组尚未测序,因此它们对不同生态系统过程的贡献机制仍未被开发。尽管从基因组生物学中已经普遍采用了恢复原核生物基因组的方法,但很少有研究从宏基因组中解决真核生物基因组的恢复问题。本研究使用 6000 个陆地和一些过渡环境的宏基因组,通过 EukRep 管道评估了微生物真核生物基因组的重建。只有 215 个宏基因组文库产生了真核生物类群。从总共恢复的 447 个真核生物类群中,有 197 个被分类为门水平。有丝分裂植物和真菌是最具代表性的分支,分别有 83 和 73 个类群。从被分类为宿主相关、水生和人为陆地的生物群落的样本中,获得了超过 78%的获得的真核生物类群。然而,只有 93 个类群在分类学上被分配到属水平,17 个类群被分配到种水平。共获得了 193 个类群的完整性和污染估计值,分别为 44.64%(σ=27.41%)和 3.97%(σ=6.53%)。最常见的分类单元是 Micromonas commoda,而 Saccharomyces cerevisiae 则具有最高的完整性,这可能是因为有更多的参考基因组可用。目前的完整性度量标准基于单拷贝基因的存在。然而,将从恢复的真核生物类群中获得的基因序列映射到参考基因组的染色体上,发现了许多缺口,这表明完整性度量标准还应包括染色体覆盖度。从长读测序、开发处理富含重复序列的基因组的工具以及改进参考基因组数据库中,将极大地受益于真核生物基因组的恢复。