Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea.
Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac044.
Metagenomic assembly using high-throughput sequencing data is a powerful method to construct microbial genomes in environmental samples without cultivation. However, metagenomic assembly, especially when only short reads are available, is a complex and challenging task because mixed genomes of multiple microorganisms constitute the metagenome. Although long read sequencing technologies have been developed and have begun to be used for metagenomic assembly, many metagenomic studies have been performed based on short reads because the generation of long reads requires higher sequencing cost than short reads.
In this study, we present a new method called PLR-GEN. It creates pseudo-long reads from metagenomic short reads based on given reference genome sequences by considering small sequence variations existing in individual genomes of the same or different species. When applied to a mock community data set in the Human Microbiome Project, PLR-GEN dramatically extended short reads in length of 101 bp to pseudo-long reads with N50 of 33 Kbp and 0.4% error rate. The use of these pseudo-long reads generated by PLR-GEN resulted in an obvious improvement of metagenomic assembly in terms of the number of sequences, assembly contiguity, and prediction of species and genes.
PLR-GEN can be used to generate artificial long read sequences without spending extra sequencing cost, thus aiding various studies using metagenomes.
利用高通量测序数据进行宏基因组组装是一种在无需培养的情况下构建环境样本中微生物基因组的强大方法。然而,宏基因组组装,特别是当只有短读长可用时,是一项复杂且具有挑战性的任务,因为混合了多个微生物的混合基因组构成了宏基因组。尽管长读长测序技术已经开发出来并开始用于宏基因组组装,但由于长读长的生成比短读长需要更高的测序成本,许多宏基因组研究都是基于短读长进行的。
在本研究中,我们提出了一种名为 PLR-GEN 的新方法。它通过考虑同一或不同物种的单个基因组中存在的小序列变异,根据给定的参考基因组序列从宏基因组短读长中创建伪长读长。当应用于人类微生物组计划中的模拟群落数据集时,PLR-GEN 将 101bp 的短读长显著延长到具有 33Kbp N50 和 0.4%错误率的伪长读长。使用 PLR-GEN 生成的这些伪长读长可明显提高宏基因组组装的序列数量、组装连续性以及物种和基因的预测。
PLR-GEN 可用于生成无需额外测序成本的人工长读长序列,从而有助于各种使用宏基因组的研究。