Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, B15 2TT, UK.
Zymo Research Corporation, 17062 Murphy Ave., Irvine, CA 92614, USA.
Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz043.
Long sequencing reads are information-rich: aiding de novo assembly and reference mapping, and consequently have great potential for the study of microbial communities. However, the best approaches for analysis of long-read metagenomic data are unknown. Additionally, rigorous evaluation of bioinformatics tools is hindered by a lack of long-read data from validated samples with known composition.
We sequenced 2 commercially available mock communities containing 10 microbial species (ZymoBIOMICS Microbial Community Standards) with Oxford Nanopore GridION and PromethION. Both communities and the 10 individual species isolates were also sequenced with Illumina technology. We generated 14 and 16 gigabase pairs from 2 GridION flowcells and 150 and 153 gigabase pairs from 2 PromethION flowcells for the evenly distributed and log-distributed communities, respectively. Read length N50 ranged between 5.3 and 5.4 kilobase pairs over the 4 sequencing runs. Basecalls and corresponding signal data are made available (4.2 TB in total). Alignment to Illumina-sequenced isolates demonstrated the expected microbial species at anticipated abundances, with the limit of detection for the lowest abundance species below 50 cells (GridION). De novo assembly of metagenomes recovered long contiguous sequences without the need for pre-processing techniques such as binning.
We present ultra-deep, long-read nanopore datasets from a well-defined mock community. These datasets will be useful for those developing bioinformatics methods for long-read metagenomics and for the validation and comparison of current laboratory and software pipelines.
长测序读长信息丰富:有助于从头组装和参考映射,因此对微生物群落的研究具有很大的潜力。然而,目前还不清楚分析长读长宏基因组数据的最佳方法。此外,由于缺乏来自具有已知组成的经过验证的样本的长读长数据,因此对生物信息学工具的严格评估受到阻碍。
我们使用 Oxford Nanopore GridION 和 PromethION 对包含 10 种微生物物种的 2 种商业可用模拟群落(ZymoBIOMICS 微生物群落标准)进行了测序。这两个群落和 10 个单独的物种分离株也使用 Illumina 技术进行了测序。我们从 2 个 GridION 流动池分别生成了 14 和 16 千兆碱基对,从 2 个 PromethION 流动池分别生成了 150 和 153 千兆碱基对,用于均匀分布和对数分布的群落。4 次测序运行的读长 N50 范围在 5.3 到 5.4 千碱基对之间。碱基调用和相应的信号数据可用(总共 4.2TB)。与 Illumina 测序分离株的比对表明,预期的微生物物种在预期的丰度下,最低丰度物种的检测限低于 50 个细胞(GridION)。宏基因组的从头组装恢复了长的连续序列,而无需进行预处理技术,如分箱。
我们提出了来自明确定义的模拟群落的超深度、长读长纳米孔数据集。这些数据集将有助于开发长读长宏基因组学的生物信息学方法,并验证和比较当前的实验室和软件管道。