Laboratório de Genômica e bioEnergia (LGE), Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil.
Laboratório Central de Tecnologias de Alto Desempenho (LaCTAD), Universidade Estadual de Campinas, Campinas, SP, Brazil.
DNA Res. 2019 Jun 1;26(3):205-216. doi: 10.1093/dnares/dsz001.
The Polyploid Gene Assembler (PGA), developed and tested in this study, represents a new strategy to perform gene-space assembly from complex genomes using low coverage DNA sequencing. The pipeline integrates reference-assisted loci and de novo assembly strategies to construct high-quality sequences focused on gene content. Pipeline validation was conducted with wheat (Triticum aestivum), a hexaploid species, using barley (Hordeum vulgare) as reference, that resulted in the identification of more than 90% of genes and several new genes. Moreover, PGA was used to assemble gene content in Saccharum spontaneum species, a parental lineage for hybrid sugarcane cultivars. Saccharum spontaneum gene sequence obtained was used to reference-guided transcriptome analysis of six different tissues. A total of 39,234 genes were identified, 60.4% clustered into known grass gene families. Thirty-seven gene families were expanded when compared with other grasses, three of them highlighted by the number of gene copies potentially involved in initial development and stress response. In addition, 3,108 promoters (many showing tissue specificity) were identified in this work. In summary, PGA can reconstruct high-quality gene sequences from polyploid genomes, as shown for wheat and S. spontaneum species, and it is more efficient than conventional genome assemblers using low coverage DNA sequencing.
多倍体基因组装器(PGA)是本研究中开发和测试的一种新策略,用于使用低覆盖度 DNA 测序从复杂基因组中进行基因空间组装。该流水线集成了参考辅助基因座和从头组装策略,以构建专注于基因内容的高质量序列。使用大麦(Hordeum vulgare)作为参考,对六倍体物种小麦(Triticum aestivum)进行了流水线验证,结果鉴定出了超过 90%的基因和几个新基因。此外,PGA 还用于组装甘蔗亲本种甜根子草(Saccharum spontaneum)的基因内容。获得的甜根子草基因序列用于对六个不同组织的参考指导转录组分析。共鉴定出 39234 个基因,其中 60.4%聚类到已知的禾本科基因家族中。与其他禾本科植物相比,有 37 个基因家族得到了扩展,其中三个基因家族的基因拷贝数量可能涉及到初始发育和应激反应。此外,本研究还鉴定出 3108 个启动子(许多具有组织特异性)。总之,PGA 可以从多倍体基因组中重建高质量的基因序列,如对小麦和甜根子草物种的研究所示,并且它比使用低覆盖度 DNA 测序的常规基因组组装器更有效。