Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland 06466, Germany.
German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Leipzig 04103, Germany.
Plant Cell. 2021 Jul 19;33(6):1888-1906. doi: 10.1093/plcell/koab077.
Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
对大型和重复丰富的植物基因组进行序列组装一直具有挑战性,需要大量的计算资源,并且通常需要几种互补的序列组装和基因组映射方法。最近,在 PacBio 平台上通过环形一致测序(CCS)进行快速、准确的长读测序的发展可能会极大地增加植物泛基因组项目的范围。在这里,我们比较了当前的长读测序平台,以评估它们在大麦(Hordeum vulgare)泛基因组研究中快速生成连续序列组装的能力。大多数长读组装明显优于基于短读的当前大麦参考序列。基于准确长读的组装在大多数指标上都表现出色,但 CCS 方法是组装数十个大麦基因组的最具成本效益的策略。抽样分析表明,20 倍 CCS 覆盖度可产生非常好的序列组装,而即使是 5 倍 CCS 数据也可能捕获大多数基因的完整序列。我们为大麦提供了一个更新的参考基因组组装,几乎完整地表示了富含重复序列的基因间空间。长读组装可以为构建三叶草作物及其野生近缘种的准确和完整的多个基因组序列奠定基础,从而构建泛基因组基础设施。