Centre for Research in Agricultural Genomics (CRAG), Consejo Superior de Investigaciones Científicas (CSIC) - Institut de Recerca i Tecnologies Agroalimentaries (IRTA) - Universitat Autònoma de Barcelona (UAB) - Universitat de Barcelona (UB) Consortium, 08193 Bellaterra, Barcelona, Spain.
Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan 48824.
Genetics. 2018 Nov;210(3):809-819. doi: 10.1534/genetics.118.301298. Epub 2018 Aug 31.
The genetic analysis of complex traits does not escape the current excitement around artificial intelligence, including a renewed interest in "deep learning" (DL) techniques such as Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). However, the performance of DL for genomic prediction of complex human traits has not been comprehensively tested. To provide an evaluation of MLPs and CNNs, we used data from distantly related white Caucasian individuals ( ∼100k individuals, ∼500k SNPs, and = 1000) of the interim release of the UK Biobank. We analyzed a total of five phenotypes: height, bone heel mineral density, body mass index, systolic blood pressure, and waist-hip ratio, with genomic heritabilities ranging from ∼0.20 to 0.70. After hyperparameter optimization using a genetic algorithm, we considered several configurations, from shallow to deep learners, and compared the predictive performance of MLPs and CNNs with that of Bayesian linear regressions across sets of SNPs (from 10k to 50k) that were preselected using single-marker regression analyses. For height, a highly heritable phenotype, all methods performed similarly, although CNNs were slightly but consistently worse. For the rest of the phenotypes, the performance of some CNNs was comparable or slightly better than linear methods. Performance of MLPs was highly dependent on SNP set and phenotype. In all, over the range of traits evaluated in this study, CNN performance was competitive to linear models, but we did not find any case where DL outperformed the linear model by a sizable margin. We suggest that more research is needed to adapt CNN methodology, originally motivated by image analysis, to genetic-based problems in order for CNNs to be competitive with linear models.
复杂性状的遗传分析也无法逃避当前人工智能的热潮,包括对“深度学习”(DL)技术的重新关注,如多层感知机(MLP)和卷积神经网络(CNN)。然而,DL 技术在复杂人类性状的基因组预测中的性能尚未得到全面测试。为了评估 MLP 和 CNN,我们使用了 UK Biobank 中期释放的远缘白种个体(∼100k 个体,∼500k SNP,和 = 1000)的数据。我们分析了五个表型:身高、脚跟骨矿物质密度、体重指数、收缩压和腰臀比,其基因组遗传力范围从∼0.20 到 0.70。使用遗传算法进行超参数优化后,我们考虑了几种配置,从浅层到深层学习者,并比较了 MLP 和 CNN 与贝叶斯线性回归在使用单标记回归分析预选 SNP 集(从 10k 到 50k)上的预测性能。对于身高这个高度遗传的表型,所有方法的性能都相似,尽管 CNN 略差但始终稍差。对于其余表型,一些 CNN 的性能与线性方法相当或略好。MLP 的性能高度依赖于 SNP 集和表型。总的来说,在本研究评估的一系列性状中,CNN 的性能与线性模型具有竞争力,但我们没有发现任何情况下 DL 以相当大的优势超过线性模型。我们建议需要进一步研究,以适应最初由图像分析驱动的 CNN 方法,以便 CNN 能够与线性模型竞争。