Department of Computational Biology, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA.
Department of Biological Sciences, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA.
J Bioinform Comput Biol. 2021 Dec;19(6):2140013. doi: 10.1142/S0219720021400138. Epub 2021 Nov 19.
The exon shuffling theory posits that intronic recombination creates new domain combinations, facilitating the evolution of novel protein function. This theory predicts that introns will be preferentially situated near domain boundaries. Many studies have sought evidence for exon shuffling by testing the correspondence between introns and domain boundaries against chance intron positioning. Here, we present an empirical investigation of how the choice of null model influences significance. Although genome-wide studies have used a uniform null model, exclusively, more realistic null models have been proposed for single gene studies. We extended these models for genome-wide analyses and applied them to 21 metazoan and fungal genomes. Our results show that compared with the other two models, the uniform model does not recapitulate genuine exon lengths, dramatically underestimates the probability of chance agreement, and overestimates the significance of intron-domain correspondence by as much as 100 orders of magnitude. Model choice had much greater impact on the assessment of exon shuffling in fungal genomes than in metazoa, leading to different evolutionary conclusions in seven of the 16 fungal genomes tested. Genome-wide studies that use this overly permissive null model may exaggerate the importance of exon shuffling as a general mechanism of multidomain evolution.
外显子改组理论认为,内含子重组创造了新的结构域组合,促进了新蛋白功能的进化。这一理论预测,内含子将优先位于结构域边界附近。许多研究通过测试内含子与结构域边界之间的对应关系是否符合随机内含子定位,来寻找外显子改组的证据。在这里,我们对选择何种零模型来影响显著性进行了实证研究。尽管全基因组研究一直只使用统一的零模型,但对于单基因研究,已经提出了更现实的零模型。我们将这些模型扩展到全基因组分析中,并将其应用于 21 个后生动物和真菌基因组。我们的研究结果表明,与其他两种模型相比,统一模型不能重现真正的外显子长度,极大地低估了偶然一致性的概率,并高估了内含子-结构域对应关系的显著性高达 100 个数量级。模型选择对真菌基因组中外显子改组的评估影响更大,导致在测试的 16 个真菌基因组中的 7 个中得出了不同的进化结论。使用这种过于宽松的零模型的全基因组研究可能夸大了外显子改组作为多结构域进化一般机制的重要性。