Catalan Institution for Research and Advanced Studies (ICREA), Passeig de Lluís Companys 23, 08010 Barcelona, Spain.
Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193 Bellaterra, Barcelona, Spain.
Genes (Basel). 2019 Jul 20;10(7):553. doi: 10.3390/genes10070553.
Deep learning (DL) has emerged as a powerful tool to make accurate predictions from complex data such as image, text, or video. However, its ability to predict phenotypic values from molecular data is less well studied. Here, we describe the theoretical foundations of DL and provide a generic code that can be easily modified to suit specific needs. DL comprises a wide variety of algorithms which depend on numerous hyperparameters. Careful optimization of hyperparameter values is critical to avoid overfitting. Among the DL architectures currently tested in genomic prediction, convolutional neural networks (CNNs) seem more promising than multilayer perceptrons (MLPs). A limitation of DL is in interpreting the results. This may not be relevant for genomic prediction in plant or animal breeding but can be critical when deciding the genetic risk to a disease. Although DL technologies are not "plug-and-play", they are easily implemented using Keras and TensorFlow public software. To illustrate the principles described here, we implemented a Keras-based code in GitHub.
深度学习 (DL) 已成为从图像、文本或视频等复杂数据中进行准确预测的强大工具。然而,其从分子数据预测表型值的能力研究较少。在这里,我们描述了 DL 的理论基础,并提供了一个通用代码,可轻松修改以满足特定需求。DL 由多种算法组成,这些算法依赖于许多超参数。超参数值的仔细优化对于避免过拟合至关重要。在目前用于基因组预测的 DL 架构中,卷积神经网络 (CNN) 似乎比多层感知机 (MLP) 更有前途。DL 的一个限制在于解释结果。这在植物或动物育种的基因组预测中可能并不相关,但在决定疾病的遗传风险时可能至关重要。虽然 DL 技术不是“即插即用”,但可以使用 Keras 和 TensorFlow 公共软件轻松实现。为了说明这里描述的原理,我们在 GitHub 上实现了一个基于 Keras 的代码。