Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA; Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA; Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.
Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA; Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA.
Trends Genet. 2018 Oct;34(10):746-754. doi: 10.1016/j.tig.2018.07.004. Epub 2018 Aug 20.
Accurate prediction of complex traits requires using a large number of DNA variants. Advances in statistical and machine learning methodology enable the identification of complex patterns in high-dimensional settings. However, training these highly parameterized methods requires very large data sets. Until recently, such data sets were not available. But the situation is changing rapidly as very large biomedical data sets comprising individual genotype-phenotype data for hundreds of thousands of individuals become available in public and private domains. We argue that the convergence of advances in methodology and the advent of Big Genomic Data will enable unprecedented improvements in complex-trait prediction; we review theory and evidence supporting our claim and discuss challenges and opportunities that Big Data will bring to complex-trait prediction.
准确预测复杂性状需要使用大量的 DNA 变体。统计和机器学习方法的进步使得在高维环境中识别复杂模式成为可能。然而,训练这些高度参数化的方法需要非常大的数据集。直到最近,这种数据集还不可用。但是,随着越来越多的公共和私人领域提供包含数十万人个体基因型-表型数据的大型生物医学数据集,这种情况正在迅速改变。我们认为,方法上的进步和大型基因组数据的出现将使复杂性状预测取得前所未有的进展;我们回顾了支持我们主张的理论和证据,并讨论了大数据将给复杂性状预测带来的挑战和机遇。