Department of Genetics, and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08554, USA.
Trends Genet. 2018 Apr;34(4):301-312. doi: 10.1016/j.tig.2017.12.005. Epub 2018 Jan 10.
As population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning (ML). We review the fundamentals of ML, discuss recent applications of supervised ML to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised ML is an important and underutilized tool that has considerable potential for the world of evolutionary genomics.
随着人口基因组数据集规模的不断增长,研究人员面临着一项艰巨的任务,即从大量信息中理出头绪。为了跟上数据爆炸式增长的步伐,用于群体遗传推断的计算方法学正在迅速发展,以充分利用基因组序列数据。在本文中,我们讨论了计算群体基因组学中出现的一个新范例:监督机器学习(ML)。我们回顾了 ML 的基础知识,讨论了监督 ML 在群体遗传学中的最新应用,这些应用优于竞争方法,并描述了该领域有前景的未来方向。最终,我们认为监督 ML 是一种重要但未被充分利用的工具,它在进化基因组学领域具有相当大的潜力。