GrandOmics Biosciences, No.1, East Nengyuan Road, Beijing 102200, China.
State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Research Center for Perennial Rice Engineering and Technology of Yunnan, School of Agriculture, Yunnan University, No.2, North Cuihu Road, Kunming, Yunnan 650091, China.
Gigascience. 2020 Dec 15;9(12). doi: 10.1093/gigascience/giaa123.
The availability of reference genomes has revolutionized the study of biology. Multiple competing technologies have been developed to improve the quality and robustness of genome assemblies during the past decade. The 2 widely used long-read sequencing providers-Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)-have recently updated their platforms: PacBio enables high-throughput HiFi reads with base-level resolution of >99%, and ONT generated reads as long as 2 Mb. We applied the 2 up-to-date platforms to a single rice individual and then compared the 2 assemblies to investigate the advantages and limitations of each.
The results showed that ONT ultralong reads delivered higher contiguity, producing a total of 18 contigs of which 10 were assembled into a single chromosome compared to 394 contigs and 3 chromosome-level contigs for the PacBio assembly. The ONT ultralong reads also prevented assembly errors caused by long repetitive regions, for which we observed a total of 44 genes of false redundancies and 10 genes of false losses in the PacBio assembly, leading to over- or underestimation of the gene families in those long repetitive regions. We also noted that the PacBio HiFi reads generated assemblies with considerably fewer errors at the level of single nucleotides and small insertions and deletions than those of the ONT assembly, which generated an average 1.06 errors per kb and finally engendered 1,475 incorrect gene annotations via altered or truncated protein predictions.
It shows that both PacBio HiFi reads and ONT ultralong reads had their own merits. Further genome reference constructions could leverage both techniques to lessen the impact of assembly errors and subsequent annotation mistakes rooted in each.
参考基因组的可用性彻底改变了生物学的研究方式。在过去十年中,为了提高基因组组装的质量和稳健性,已经开发出多种竞争技术。两个广泛使用的长读测序供应商——太平洋生物科学公司(PacBio)和牛津纳米孔技术公司(ONT)——最近都更新了他们的平台:PacBio 能够实现高通量 HiFi 读取,具有>99%的碱基分辨率,ONT 生成的读取长达 2Mb。我们将这两个最新的平台应用于单个水稻个体,然后比较这两个组装结果,以调查每个平台的优缺点。
结果表明,ONT 超长读长提供了更高的连续性,总共产生了 18 个 contigs,其中 10 个组装成单个染色体,而 PacBio 组装的 contigs 有 394 个,染色体级 contigs 有 3 个。ONT 超长读长还防止了由长重复区域引起的组装错误,我们在 PacBio 组装中总共观察到 44 个基因的假冗余和 10 个基因的假缺失,导致这些长重复区域中的基因家族被高估或低估。我们还注意到,PacBio HiFi 读取生成的组装在单个核苷酸和小插入/缺失水平上的错误要少得多,而 ONT 组装生成的错误平均每 kb 有 1.06 个错误,最终通过改变或截断蛋白质预测产生了 1475 个错误的基因注释。
这表明 PacBio HiFi 读取和 ONT 超长读长都有各自的优点。进一步的基因组参考构建可以利用这两种技术来减轻由于每个技术而产生的组装错误和随后的注释错误的影响。