Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.
NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.
Genome Res. 2023 May;33(5):729-740. doi: 10.1101/gr.277515.122. Epub 2023 May 1.
Understanding the genetic causes of trait variation is a primary goal of genetic research. One way that individuals can vary genetically is through variable pangenomic genes: genes that are only present in some individuals in a population. The presence or absence of entire genes could have large effects on trait variation. However, variable pangenomic genes can be missed in standard genotyping workflows, owing to reliance on aligning short-read sequencing to reference genomes. A popular method for studying the genetic basis of trait variation is linkage mapping, which identifies quantitative trait loci (QTLs), regions of the genome that harbor causative genetic variants. Large-scale linkage mapping in the budding yeast has found thousands of QTLs affecting myriad yeast phenotypes. To enable the resolution of QTLs caused by variable pangenomic genes, we used long-read sequencing to generate highly complete de novo genome assemblies of 16 diverse yeast isolates. With these assemblies, we resolved QTLs for growth on maltose, sucrose, raffinose, and oxidative stress to specific genes that are absent from the reference genome but present in the broader yeast population at appreciable frequency. Copies of genes also duplicate onto chromosomes where they are absent in the reference genome, and we found that these copies generate additional QTLs whose resolution requires pangenome characterization. Our findings show the need for highly complete genome assemblies to identify the genetic basis of trait variation.
了解性状变异的遗传原因是遗传研究的主要目标之一。个体在遗传上发生变异的一种方式是通过可变泛基因组基因:这些基因仅存在于群体中的某些个体中。整个基因的存在或缺失可能对性状变异产生重大影响。然而,由于依赖于将短读测序与参考基因组对齐,标准的基因分型工作流程可能会错过可变泛基因组基因。研究性状变异遗传基础的一种流行方法是连锁映射,它可以识别出含有致病遗传变异的数量性状基因座(QTL)。在 budding yeast 中的大规模连锁映射发现了数千个影响多种酵母表型的 QTL。为了能够解析由可变泛基因组基因引起的 QTL,我们使用长读测序技术对 16 个不同的酵母分离株进行了高度完整的从头基因组组装。有了这些组装,我们将麦芽糖、蔗糖、棉子糖和氧化应激生长的 QTL 解析到特定的基因上,这些基因不存在于参考基因组中,但在更广泛的酵母群体中以相当高的频率存在。基因的拷贝也会在染色体上复制,而这些染色体在参考基因组中不存在,我们发现这些拷贝产生了其他需要泛基因组特征描述才能解析的 QTL。我们的研究结果表明,需要高度完整的基因组组装来识别性状变异的遗传基础。