Department of Genetics, Evolution and Environment, University College London, London, United Kingdom.
Department of Biology and York Biomedical Research Institute, University of York, United Kingdom.
Mol Biol Evol. 2019 Aug 1;36(8):1612-1623. doi: 10.1093/molbev/msz113.
The relationship between DNA sequence, biochemical function, and molecular evolution is relatively well-described for protein-coding regions of genomes, but far less clear in noncoding regions, particularly, in eukaryote genomes. In part, this is because we lack a complete description of the essential noncoding elements in a eukaryote genome. To contribute to this challenge, we used saturating transposon mutagenesis to interrogate the Schizosaccharomyces pombe genome. We generated 31 million transposon insertions, a theoretical coverage of 2.4 insertions per genomic site. We applied a five-state hidden Markov model (HMM) to distinguish insertion-depleted regions from insertion biases. Both raw insertion-density and HMM-defined fitness estimates showed significant quantitative relationships to gene knockout fitness, genetic diversity, divergence, and expected functional regions based on transcription and gene annotations. Through several analyses, we conclude that transposon insertions produced fitness effects in 66-90% of the genome, including substantial portions of the noncoding regions. Based on the HMM, we estimate that 10% of the insertion depleted sites in the genome showed no signal of conservation between species and were weakly transcribed, demonstrating limitations of comparative genomics and transcriptomics to detect functional units. In this species, 3'- and 5'-untranslated regions were the most prominent insertion-depleted regions that were not represented in measures of constraint from comparative genomics. We conclude that the combination of transposon mutagenesis, evolutionary, and biochemical data can provide new insights into the relationship between genome function and molecular evolution.
DNA 序列、生化功能和分子进化之间的关系在基因组的蛋白质编码区域中描述得相对较好,但在非编码区域,特别是在真核生物基因组中,情况就不那么清楚了。部分原因是我们缺乏对真核生物基因组中必需非编码元件的完整描述。为了应对这一挑战,我们使用饱和转座子诱变技术来研究酿酒酵母的基因组。我们生成了 3100 万个转座子插入,理论上每个基因组位点有 2.4 个插入。我们应用五状态隐马尔可夫模型(HMM)来区分插入缺失区域和插入偏好。原始插入密度和 HMM 定义的适合度估计都与基因敲除适合度、遗传多样性、分化以及基于转录和基因注释的预期功能区域有显著的定量关系。通过几项分析,我们得出结论,转座子插入在基因组的 66-90%产生了适合度效应,包括非编码区域的大量部分。根据 HMM,我们估计基因组中 10%的插入缺失位点在物种间没有保守信号,转录也较弱,这表明比较基因组学和转录组学在检测功能单元方面存在局限性。在这个物种中,3' 和 5' 非翻译区是最突出的插入缺失区域,在比较基因组学的约束措施中没有体现。我们得出结论,转座子诱变、进化和生化数据的结合可以为基因组功能和分子进化之间的关系提供新的见解。