Suppr超能文献

深度多基因神经网络预测和鉴定印度尼西亚水稻品种中的产量相关基因。

Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessions.

机构信息

BINUS Graduate Program, Bina Nusantara University, Jakarta, 11480, Indonesia.

School of Computer Science, Bina Nusantara University, Jakarta, 11480, Indonesia.

出版信息

Sci Rep. 2022 Aug 15;12(1):13823. doi: 10.1038/s41598-022-16075-9.

Abstract

As the fourth most populous country in the world, Indonesia must increase the annual rice production rate to achieve national food security by 2050. One possible solution comes from the nanoscopic level: a genetic variant called Single Nucleotide Polymorphism (SNP), which can express significant yield-associated genes. The prior benchmark of this study utilized a statistical genetics model where no SNP position information and attention mechanism were involved. Hence, we developed a novel deep polygenic neural network, named the NucleoNet model, to address these obstacles. The NucleoNets were constructed with the combination of prominent components that include positional SNP encoding, the context vector, wide models, Elastic Net, and Shannon's entropy loss. This polygenic modeling obtained up to 2.779 of Mean Squared Error (MSE) with 47.156% of Symmetric Mean Absolute Percentage Error (SMAPE), while revealing 15 new important SNPs. Furthermore, the NucleoNets reduced the MSE score up to 32.28% compared to the Ordinary Least Squares (OLS) model. Through the ablation study, we learned that the combination of Xavier distribution for weights initialization and Normal distribution for biases initialization sparked more various important SNPs throughout 12 chromosomes. Our findings confirmed that the NucleoNet model was successfully outperformed the OLS model and identified important SNPs to Indonesian rice yields.

摘要

作为世界第四大人口大国,印度尼西亚必须提高每年的水稻产量,才能在 2050 年实现国家粮食安全。一种可能的解决方案来自纳米级水平:一种名为单核苷酸多态性(SNP)的遗传变异,它可以表达与产量显著相关的基因。本研究的先前基准利用了一个统计遗传学模型,其中不涉及 SNP 位置信息和注意力机制。因此,我们开发了一种新的深度多基因神经网络,命名为 NucleoNet 模型,以解决这些障碍。NucleoNets 是通过结合突出的组件构建的,包括位置 SNP 编码、上下文向量、宽模型、弹性网络和香农熵损失。这种多基因建模获得了高达 2.779 的均方误差(MSE)和 47.156%的对称平均绝对百分比误差(SMAPE),同时揭示了 15 个新的重要 SNP。此外,与普通最小二乘法(OLS)模型相比,NucleoNets 将 MSE 评分降低了 32.28%。通过消融研究,我们了解到,对于权重初始化的 Xavier 分布和对于偏差初始化的正态分布的组合,激发了 12 条染色体中更多不同的重要 SNP。我们的研究结果证实,NucleoNet 模型成功地优于 OLS 模型,并确定了对印度尼西亚水稻产量重要的 SNP。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1868/9378700/8ebcf9555806/41598_2022_16075_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验