Department of Evolutionary and Environmental Biology, Institute of Evolution, University of Haifa, Haifa, 3498838, Israel.
Sci Rep. 2019 Apr 24;9(1):6480. doi: 10.1038/s41598-019-42795-6.
Diverse invertebrate taxa including all 200,000 species of Hymenoptera (ants, bees, wasps and sawflies) have a haplodiploid sex determination system, where females are diploid and males are haploid. Thus, hymenopteran genome projects can make use of DNA from a single haploid male sample, which is assumed advantageous for genome assembly. For the purpose of gene annotation, transcriptome sequencing is usually conducted using RNA from a pool of individuals. We conducted a comparative analysis of genome and transcriptome assembly and annotation methods, using genetic sources of different ploidy: (1) DNA from a haploid male or a diploid female (2) RNA from the same haploid male or a pool of individuals. We predicted that the use of a haploid male as opposed to a diploid female will simplify the genome assembly and gene annotation thanks to the lack of heterozygosity. Using DNA and RNA from the same haploid individual is expected to provide better confidence in transcript-to-genome alignment, and improve the annotation of gene structure in terms of the exon/intron boundaries. The haploid genome assemblies proved to be more contiguous, with both contig and scaffold N50 size at least threefold greater than their diploid counterparts. Completeness evaluation showed mixed results. The SOAPdenovo2 diploid assembly was missing more genes than the haploid assembly. The SPAdes diploid assembly had more complete genes, but a higher level of duplicates, and a greatly overestimated genome size. When aligning the two transcriptomes against the male genome, the male transcriptome gave 2-3% more complete transcripts than the pool transcriptome for genes with comparable expression levels in both transcriptomes. However, this advantage disappears in the final results of the gene annotation pipeline that incorporates evidence from homologous proteins. The RNA pool is still required to obtain the full transcriptome with genes that are expressed in other life stages and castes. In conclusion, the use of a haploid source material for a de novo genome project provides a substantial advantage to the quality of the genome draft and the use of RNA from the same haploid individual for transcriptome to genome alignment provides a minor advantage for genes that are expressed in the adult male.
包括所有 20 万种膜翅目昆虫(蚂蚁、蜜蜂、黄蜂和叶蜂)在内的多种无脊椎动物类群都具有单倍型二倍体型性别决定系统,其中雌性为二倍体,雄性为单倍体。因此,膜翅目基因组计划可以利用来自单个单倍体雄性样本的 DNA,这对于基因组组装是有利的。出于基因注释的目的,通常使用来自个体群体的 RNA 进行转录组测序。我们使用不同倍性的遗传资源进行了基因组和转录组组装和注释方法的比较分析:(1)来自单倍体雄性或二倍体雌性的 DNA;(2)来自相同单倍体雄性或个体群体的 RNA。我们预计,由于缺乏杂合性,使用单倍体雄性而不是二倍体雌性将简化基因组组装和基因注释。使用来自相同单倍体个体的 DNA 和 RNA 预计将提高转录组与基因组比对的置信度,并改善基因结构的注释,包括外显子/内含子边界。单倍体基因组组装被证明更连续,无论是 contig 还是 scaffold N50 大小都至少比二倍体对应物大三倍。完整性评估结果好坏参半。SOAPdenovo2 二倍体组装缺少的基因比单倍体组装多。SPAdes 二倍体组装具有更多完整的基因,但重复水平更高,并且基因组大小被大大高估。当将两个转录组与雄性基因组对齐时,对于两个转录组中表达水平相当的基因,雄性转录组比个体转录组给出的完整转录本多 2-3%。然而,在包含同源蛋白证据的基因注释管道的最终结果中,这种优势消失了。仍然需要 RNA 池来获得具有在其他生命阶段和等级中表达的基因的完整转录组。总之,在从头基因组项目中使用单倍体源材料为基因组草案的质量提供了实质性的优势,并且使用来自相同单倍体个体的 RNA 进行转录组到基因组比对为在成年雄性中表达的基因提供了较小的优势。