Department of Medicine, University of California San Diego, La Jolla, California, United States of America.
PLoS Comput Biol. 2021 Sep 17;17(9):e1009373. doi: 10.1371/journal.pcbi.1009373. eCollection 2021 Sep.
Despite the growing constellation of genetic loci linked to common traits, these loci have yet to account for most heritable variation, and most act through poorly understood mechanisms. Recent machine learning (ML) systems have used hierarchical biological knowledge to associate genetic mutations with phenotypic outcomes, yielding substantial predictive power and mechanistic insight. Here, we use an ontology-guided ML system to map single nucleotide variants (SNVs) focusing on 6 classic phenotypic traits in natural yeast populations. The 29 identified loci are largely novel and account for ~17% of the phenotypic variance, versus <3% for standard genetic analysis. Representative results show that sensitivity to hydroxyurea is linked to SNVs in two alternative purine biosynthesis pathways, and that sensitivity to copper arises through failure to detoxify reactive oxygen species in fatty acid metabolism. This work demonstrates a knowledge-based approach to amplifying and interpreting signals in population genetic studies.
尽管与常见特征相关的遗传基因座不断增加,但这些基因座尚未解释大多数可遗传变异,而且大多数都是通过人们尚不清楚的机制起作用的。最近的机器学习 (ML) 系统利用分层的生物学知识将基因突变与表型结果联系起来,从而产生了巨大的预测能力和机制洞察力。在这里,我们使用基于本体论的 ML 系统来映射单核苷酸变异 (SNV),重点关注自然酵母种群中的 6 种典型表型特征。确定的 29 个基因座主要是新的,占表型变异的~17%,而标准遗传分析的比例不到 3%。代表性结果表明,对羟基脲的敏感性与嘌呤生物合成两条替代途径中的 SNV 有关,而对铜的敏感性则是由于在脂肪酸代谢中不能解毒活性氧而产生的。这项工作展示了一种基于知识的方法,可以在群体遗传研究中放大和解释信号。