Mai T Tien, Lees John A, Gladstone Rebecca A, Corander Jukka
Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim 7034, Norway.
European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton CB10 1SD, UK.
Bioinform Adv. 2023 Mar 14;3(1):vbad027. doi: 10.1093/bioadv/vbad027. eCollection 2023.
Quantification of heritability is a fundamental desideratum in genetics, which allows an assessment of the contribution of additive genetic variation to the variability of a trait of interest. The traditional computational approaches for assessing the heritability of a trait have been developed in the field of quantitative genetics. However, the rise of modern population genomics with large sample sizes has led to the development of several new machine learning-based approaches to inferring heritability. In this article, we systematically summarize recent advances in machine learning which can be used to infer heritability. We focus on an application of these methods to bacterial genomes, where heritability plays a key role in understanding phenotypes such as antibiotic resistance and virulence, which are particularly important due to the rising frequency of antimicrobial resistance. By designing a heritability model incorporating realistic patterns of genome-wide linkage disequilibrium for a frequently recombining bacterial pathogen, we test the performance of a wide spectrum of different inference methods, including also GCTA. In addition to the synthetic data benchmark, we present a comparison of the methods for antibiotic resistance traits for multiple bacterial pathogens. Insights from the benchmarking and real data analyses indicate a highly variable performance of the different methods and suggest that heritability inference would likely benefit from tailoring of the methods to the specific genetic architecture of the target organism.
The R codes and data used in the numerical experiments are available at: https://github.com/tienmt/her_MLs.
遗传力的量化是遗传学中的一个基本需求,它能够评估加性遗传变异对感兴趣性状变异性的贡献。用于评估性状遗传力的传统计算方法是在数量遗传学领域发展起来的。然而,随着大样本现代群体基因组学的兴起,出现了几种基于机器学习的新方法来推断遗传力。在本文中,我们系统地总结了可用于推断遗传力的机器学习的最新进展。我们重点关注这些方法在细菌基因组中的应用,在细菌基因组中,遗传力在理解诸如抗生素抗性和毒力等表型方面起着关键作用,由于抗菌药物耐药性频率的上升,这些表型尤为重要。通过为一种频繁重组的细菌病原体设计一个纳入全基因组连锁不平衡现实模式的遗传力模型,我们测试了包括GCTA在内的各种不同推断方法的性能。除了合成数据基准测试外,我们还对多种细菌病原体的抗生素抗性性状的方法进行了比较。基准测试和实际数据分析的结果表明,不同方法的性能差异很大,并表明遗传力推断可能会受益于根据目标生物体的特定遗传结构对方法进行调整。
数值实验中使用的R代码和数据可在以下网址获取:https://github.com/tienmt/her_MLs 。