PNNGS，一种用于基因组选择的多卷积并行神经网络。

PNNGS, a multi-convolutional parallel neural network for genomic selection.

作者信息

Xie Zhengchao, Weng Lin, He Jingjing, Feng Xianzhong, Xu Xiaogang, Ma Yinxing, Bai Panpan, Kong Qihui

机构信息

Research Center for Life Sciences Computing, Zhejiang Laboratory, Hangzhou, China.

Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China.

出版信息

Front Plant Sci. 2024 Sep 3;15:1410596. doi: 10.3389/fpls.2024.1410596. eCollection 2024.

DOI:10.3389/fpls.2024.1410596

PMID:39290743

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11405342/

Abstract

Genomic selection (GS) can accomplish breeding faster than phenotypic selection. Improving prediction accuracy is the key to promoting GS. To improve the GS prediction accuracy and stability, we introduce parallel convolution to deep learning for GS and call it a parallel neural network for genomic selection (PNNGS). In PNNGS, information passes through convolutions of different kernel sizes in parallel. The convolutions in each branch are connected with residuals. Four different L loss functions train PNNGS. Through experiments, the optimal number of parallel paths for rice, sunflower, wheat, and maize is found to be 4, 6, 4, and 3, respectively. Phenotype prediction is performed on 24 cases through ridge-regression best linear unbiased prediction (RRBLUP), random forests (RF), support vector regression (SVR), deep neural network genomic prediction (DNNGP), and PNNGS. Serial DNNGP and parallel PNNGS outperform the other three algorithms. On average, PNNGS prediction accuracy is 0.031 larger than DNNGP prediction accuracy, indicating that parallelism can improve the GS model. Plants are divided into clusters through principal component analysis (PCA) and K-means clustering algorithms. The sample sizes of different clusters vary greatly, indicating that this is unbalanced data. Through stratified sampling, the prediction stability and accuracy of PNNGS are improved. When the training samples are reduced in small clusters, the prediction accuracy of PNNGS decreases significantly. Increasing the sample size of small clusters is critical to improving the prediction accuracy of GS.

摘要

基因组选择（GS）比表型选择能够更快地完成育种。提高预测准确性是推动基因组选择的关键。为了提高基因组选择的预测准确性和稳定性，我们将并行卷积引入用于基因组选择的深度学习，并将其称为基因组选择并行神经网络（PNNGS）。在PNNGS中，信息并行通过不同内核大小的卷积。每个分支中的卷积与残差相连。使用四种不同的损失函数训练PNNGS。通过实验发现，水稻、向日葵、小麦和玉米的最佳并行路径数分别为4、6、4和3。通过岭回归最佳线性无偏预测（RRBLUP）、随机森林（RF）、支持向量回归（SVR）、深度神经网络基因组预测（DNNGP）和PNNGS对24个案例进行表型预测。串行DNNGP和并行PNNGS的性能优于其他三种算法。平均而言，PNNGS的预测准确性比DNNGP的预测准确性高0.031，这表明并行性可以改进基因组选择模型。通过主成分分析（PCA）和K均值聚类算法将植物分为不同的簇。不同簇的样本大小差异很大，表明这是不平衡数据。通过分层抽样，提高了PNNGS的预测稳定性和准确性。当小簇中的训练样本减少时，PNNGS的预测准确性会显著下降。增加小簇的样本大小对于提高基因组选择的预测准确性至关重要。