Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, 277-8568, Japan.
AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory, Tokyo, 169-8555, Japan.
Microbiome. 2019 Aug 27;7(1):119. doi: 10.1186/s40168-019-0737-z.
Elucidating the ecological and biological identity of extrachromosomal mobile genetic elements (eMGEs), such as plasmids and bacteriophages, in the human gut remains challenging due to their high complexity and diversity.
Here, we show efficient identification of eMGEs as complete circular or linear contigs from PacBio long-read metagenomic data. De novo assembly of PacBio long reads from 12 faecal samples generated 82 eMGE contigs (2.5~666.7-kb), which were classified as 71 plasmids and 11 bacteriophages, including 58 novel plasmids and six bacteriophages, and complete genomes of five diverse crAssphages with terminal direct repeats. In a dataset of 413 gut metagenomes from five countries, many of the identified plasmids were highly abundant and prevalent. The ratio of gut plasmids by our plasmid data is more than twice that in the public database. Plasmids outnumbered bacterial chromosomes three to one on average in this metagenomic dataset. Host prediction suggested that Bacteroidetes-associated plasmids predominated, regardless of microbial abundance. The analysis found several plasmid-enriched functions, such as inorganic ion transport, while antibiotic resistance genes were harboured mostly in low-abundance Proteobacteria-associated plasmids.
Overall, long-read metagenomics provided an efficient approach for unravelling the complete structure of human gut eMGEs, particularly plasmids.
由于肠道中外源可移动遗传元件(eMGEs),如质粒和噬菌体,具有高度的复杂性和多样性,因此阐明其生态和生物学特性仍然具有挑战性。
在这里,我们展示了一种从 PacBio 长读测序宏基因组数据中高效识别 eMGEs 为完整环状或线状 contigs 的方法。对 12 个粪便样本的 PacBio 长读进行从头组装,生成了 82 个 eMGE contigs(2.5~666.7-kb),它们被分类为 71 个质粒和 11 个噬菌体,包括 58 个新的质粒和 6 个噬菌体,以及具有末端直接重复序列的 5 个不同的 crAssphage 的完整基因组。在来自五个国家的 413 个肠道宏基因组数据集,许多鉴定出的质粒高度丰富和普遍存在。通过我们的质粒数据鉴定出的质粒比例是公共数据库中的两倍多。在这个宏基因组数据集中,质粒的数量平均是细菌染色体的三倍。宿主预测表明,Bacteroidetes 相关质粒占主导地位,而与微生物丰度无关。分析发现了几个质粒富集的功能,如无机离子运输,而抗生素抗性基因主要存在于低丰度的 Proteobacteria 相关质粒中。
总的来说,长读宏基因组学为揭示人类肠道 eMGEs 的完整结构,特别是质粒,提供了一种有效的方法。